2—4 


x 


TESTS AND MEASUREMENTS 


IN INDUSTRIAL EDUCATION 


TESTS AND. MEASUREMENTS 
IN INDUSTRIAL EDUCATION 


BY 
LOUIS V. NEWKIRK, Рн.р. 
Director, Division of Industrial Arts, Chicago Board of Education 
AND 
HARRY A. GREENE, Рн.р. 


Professor of Education and Director of Burcau of Educational 
Research and Service, State University of Iowa 


Bureau Ednl. Psy. Research 
DAVID HÀ. i AINING QOLLEGE 
Dated ........... lese 

A008, Nouo frer 


NEW YORK 
JOHN WILEY & SONS, Ine. 
Lonpon: CHAPMAN & HALL, Ілмітер 


Copyright, 1935, by 
LOUIS V. NEWKIRK AND HARRY A. GREENE 


All Rights Reserved 


This book or any part thereof must not 
be reproduced in any form without 
the written permission of the publisher. 


FOURTH PRINTING, AUGUST, 1949 


PRINTED IN THE U. s, A, 


PREFACE 


'The growth of interest in tests and measurements in industrial edu- 
cation has been rapid in recent years. The last decade has produced 
numerous significant investigations in the curricular aspects of these 
special subjects, on which have been built newer and better materials 
and methods of teaching. A renewed interest in the possibilities of 
measurement of special aptitudes and achievement in the industrial 
edueation fields naturally parallels this type of development. 

'This book is planned to fit into this program. It is designed to 
bring to the attention of the shop teacher, and to students in training 
for this type of work, a simple and practical discussion of the essen- 
tial principles of educational measurements as applied to the teaching 
of shop and drawing courses. It is based upon a considerable number 
of years of experience on our part in the teaching of courses in educa- 
tional measurements and in methods in the industrial arts fields. In 
addition to these major functions, this book is planned to stimulate a 
renewed interest in the more adequate evaluation of student achieve- 
ment by teachers of industrial education who have already had some 
experience with the work. It brings together and evaluates many of 
the more important contributions to measurements in industrial arts 
and industrial education. We earnestly hope that it may also serve 
to stimulate further interest and work along these lines. 

In presenting this material we recognize the difficulty of covering in 
an adequate manner the many diffieult problems. "Throughout the 
book, the aim has been to emphasize the practieal rather than the 
theoretical. It is not planned to displace general treatises on meas- 
urements or statistics. On the other hand, it is hoped that the straight- 
forward presentation of the problems of measurement in this subject 
may eliminate the necessity for technical training in measurements 
and statistics in order for the student or teacher to use this book 
effectively. 

We wish to acknowledge our great indebtedness to the many class- 
room teachers and supervisors who have contributed directly and in- 
directly to the materials presented in this discussion. The kindness 
of authors and publishers who have given permission for the repro- 
duction of many selected portions of their work and publications is 


v 


vi 


Hews gratefully acknowledged. We are also indebted to Professor 
Arthur B. Mays of the University of Illinois, who gave valuable ed- 
itorial criticisms; to Professor A. H. Edgerton of the University of 
Wisconsin for encouragement and valuable suggestions; to President | 
Butler Laughlin of the Chieago Normal College for editorial sugges- 
tions; and to Professor Frank X. Henke of the Chicago Normal Col- 
lege for illustrative drawings. 

LJ 


1 
Chicago, Ill, 
May, 1935. 


PREFACE 


L. V. NEWKIRK | 
Н. А. GREENE | 


wa 


CONTENTS E 
List or TABLES . А b. ^ £ AW. ЛТҚ 
CHAPTER 
T: INTRODUCTION. «Wx ^e 4.454 Soh 2 
IL Types or EDUCATIONAL TESTS 2. . à 


III. Uses or Tests IN CLASSROOM AND SHOP . . 

IV. SELECTION AND EVALUATION or TESTS 

V. MEASURABLE FACTORS IN INDUSTRIAL EDUCATION 

VI. ADMINISTERING INDUSTRIAL EDUCATION TESTS . 

VIL. INDUSTRIAL EDUCATION ACHIEVEMENT TESTS А 
VIII. INTELLIGENCE AND APTITUDE TESTS IN INDUSTRIAL Epu- 

CATION « 2 e о O3, "45 

IX. Tests iN RELATED EDUCATIONAL FIELDS 

X. TESTING TECHNIQUES IN INDUSTRIAL EDUCATION 

XI. CONSTRUCTION AND USE or INFORMAL Өнор TESTS 


XII. CONSTRUCTION AND USE OF SCALES FOR THE RATING OF 
INDUSTRIAL EDUCATION PROJECTS . ë 


XIII. RATING AND DEVELOPING PERSONALITY AND CHARACTER 


TRAITS. i $8 б š چ‎ А 

XIV. SUMMARIZING THE RESULTS OF TESTING . . 

XV. INTERPRETING THE RESULTS ОЕ TESTING . . 
APPENDIX . ө * v ` М = б a Р ә " 

INDEX . "Г " А è 5 " а = . а 


LIST OF TABLES 


TABLE PAGE 
1. Ratings Assigned Woodwork Samples 4 
2. Major Factors Considered by Judges in Rating "Three Woodworking 
Projects à s 5 & 2 D 
3. Ratings Assigned Three Beginning Drawing Prajeets 5 
4. Major Factors Considered by Judges in Rating the Three Танаа 6 
5. Ratings Assigned Three Sheet-Metal Projects . 7 
6. Major Factors Considered by ial in Rating the Three Sheet- Metal 
Projects 1-35 E М om к o: « е X 
7. Summary of Байан а 8 
8. Norms, Based оп Non-Time and Time Situation = 21 
9. Analysis of Class Instructional Weakness in Home Mechanitis Р 23 
10. Number of Items in Newkirk-Stoddard Home Mechanics Test Answered 
Correctly s . . . 24 
11. Desirable Types of Professional Information 26 
12. Ten High-Ranking Home Mechanics Jobs — . . .88 
13. Ten High-Ranking Home Mechanics Jobs Aecording {о 75 Home 
Mechanics Teachers 33 
14. A Rearrangement Question with diners as Aprov ed. by 5 Tradesmen 33 
15. Reliability Coefficients 36 
16. Measurable Factors in тайаныы! ication! 43 
17. Examples of Operations That Determine Quality in Industrial Education 
Subjects Я 48 
18. Reliability Coefficients jn Nash-Van Duzee Woodwork Test 65 
19. Reliability of Single Form of Newkirk-Stoddard Home Mechanics Test, 
40 Minutes Testing  . 67 
20. Reliability of Both Forms of Newkirk- Stoddard Home Mechanics Test, 
80 Minutes’ Testing . © x s х Bf 
21. Norms for Newkirk-Stoddard Home Mechanies Test 67 
22. Arrangement of Kuhlmann-Anderson Tests è T 
23. Coefficients of Reliability and Validity on Minaso Mechinnical 
Ability Tests А m XU. ж uo ж. x e 
24. Percentile Norms for Stenquist Assembling Tests 87 
25. Reliability of Iowa Silent Reading Test, Advanced 94 
26. Content of Objective Examination E Ж 139 
27. Point Scores for Scoring Performance Tests 148 
28. Ranking by Nine Judges 159 
29. Rating of Specimens by Judges А 161 
30. Percentage of Judges Rating Each Specimen Better than Another я 162 
31. Percentage Deviations from Median 163 
32. Sigma Differences Expressed as Deviations from the Median . 165 
33. Scale Differences between Specimens 166 


їх 


x LIST OF TABLES 
TABLE PAGE 
Эн КИР BE sa ew ww E E BM ИГЕ 5 
35. Scores in Number of Jobs Right on Newkirk-Stoddard Test of Home 
Mei 2 2 s Es i & © & = X xoc „ ا„‎ 
36. Scores in Table 35 Arranged in Descending Order of Size. . . . 189 
37. Suggested Relation of Range of Scores and Size of Class Intervals . . 189 
38. Data Arranged in Frequency Distribution. . . . . . . . 191 
39. Comprehension Scores on Iowa Silent Reading Test . А á à . 192 
40. Distribution of Scores and Calculation of Arithmetic Mean. . .  , 195 
41. Distribution of Test Scores of 71 Ninth-Grade Pupils E 198 
42. Illustration of Need for Measures of Variability . . . , . , 200 
43. Computation of Standard Deviation from Ungrouped Data. . . . 204 
44. Computation of Standard Deviation from Grouped Data . . . . 205 
45. Standard Deviation Technique for Assigning Class Grades. . .  . 208 
46. Correlation Table Showing Relation of Scores on Iowa Plane Geometry 
Aptitude Test » oM cC 495. Ww dd X ox uc dw IG 
47. Percentage of Forecasting Accuracy for Specific Values of r . . 217 
48. Assignment of Relative Ranks ^ . 218 
49. Computation of Percentile Scores . Fs + x à a 220 
50. End-of-Year Norms for Newkirk-Stoddard Test . . у, | . 230 
51. Scores on Iowa Silent Reading Test by a Ninth-Grade Pupil. . ‚930 
52. Grade Norms for Haggerty Reading Examination, Sigma 3 . . , 231 
58. Percentile Equivalents of Scores on Iowa Plane Geometry Aptitude Test 233 
54. 


. Suggested Point Values Corresponding to Letter Grades 


. 239 


TESTS AND MEASUREMENTS IN 
INDUSTRIAL EDUCATION 


CHAPTER I 
INTRODUCTION 


1. Significance of Measurement in Industrial Education. 

Measurement in its various forms and phases appears to be recog- 
| nized as an integral part of good classroom procedure. In no instruc- 
| tional field is there greater need for the application of the principles 
of educational measurement than in industrial education. Industrial 
education! teachers and supervisors need reliable measuring instru- 
ments in order to give more adequate educational guidance, to evaluate 
personality traits, to motivate learning, to study the effectiveness of 
teaching materials and methods, to measure pupil progress more accu- 
rately through the establishment of more definite standards of per- 
formance and through the diagnosis of pupil difficulties. The use of 
tests for such purposes in other fields is a well-established practice. 
The fundamental principles of scientific test construction and inter- 
pretation may be applied to the measurement problems of the shop 
and the drafting room when modified in the light of special needs. It 
is the purpose of this book to explain and illustrate many of the 
applications and modifications of recognized principles of measure- 
| ment to these specific fields. 

'The marked inerease in the interest in educational measurements 
on the part of teachers of industrial education is not surprising. Tt 
18 to be expected of a group of teachers who have had to face the many 
problems of a new and growing unit of instruction. In many ways the 
teachers of industrial education are most fortunate. They are working 


1 Throughout this text the term industrial education is used to include the 
dary school variously known as manual training, man- 
ual arts, industrial arts, and industrial arts education, and the vocational work 
of the continuation school, trade school, and evening schools. The fundamental 
measurement problems in all these courses are similar, although the objectives of 
the courses vary from cultural to strictly vocational. 


general courses of the secon 


2 INTRODUCTION 


in à new and growing field of instruction which rapidly is becoming 
organized in the light of modern educational objectives. They have 
the advantages of all the methods and techniques that have. been 
developed for measurement in other fields. They are in a position to 
utilize the good and discard the worthless results of earlier efforts. 
A large number of accepted principles and practices for use in con- 
structing and interpreting measuring instruments are now available 
for the evaluation of the products of teaching. From the standpoint 
of professional qualifications and classroom efficiency it is the indus- 
trial education teacher's business to understand these well-established 


principles and their special application and use in their own fields of 
instruction. 


2. Teachers’ Marks in Industrial Education. 

The earlier studies of teachers’ marks revealed the fact th 
measures were entirely unsuited for the evalu 
ment since they were extremely subjective and 
ever, these earlier studies confined themselv 
marks used in the rating of accomplishment as it is revealed on the 
written page. It is true that many of the teachers’ marks in indus- 
trial education are given on this same basis, but there is also the 
matter of rating actual projects and drawings from the shop and draft- 
ing room as well as the manipulative skills, The absence of precise 
information on the exact subjectivity and unreliability of such marks 
in the industrial education fields prompted the authors to carry on a 
series of investigations in this field. The results are presented here as 
further evidence of the need for improved methods of measuring the 
accomplishment of students in these subjects, 

Three samples from each of the fields of woodworking, drawing, 
and sheet metal were selected for study. The woodworking projects 
consisted of one gray wren-house, one red wren-house, and one rolling 
pin. The drawings were simple inked drawings, and were known as 
numbers 1, 6, and 7. (See Fig. L) The sheet-metal projects con- 
sisted of three funnels which were numbered 1, 2, and 3. 


A group of experienced industrial education teachers cooperated in 
rating these projects. The marking | 


: was done through individual or 
group conferences with the teachers and in accordance with the follow- 
ing instructions: 


at such 
ation of pupil achieve- 
quite unreliable. How- 
es mainly to teachers’ 


TEACHERS' MARKS IN INDUSTRIAL EDUCATION 8 


The teachers were asked to rate the project only on the basis of what 
they considered perfect, and to give no consideration to grade stand- 
ards for such a project. The factor of grades was avoided, since it was 
the main purpose here to discover how much variability there is among 


WREN HOUSES 


FUNNEI.S 


DRAWINGS 


Fic. 1—Samples for rating projects. 


teachers in their concept of what is good workmanship. The marks 
and rating factors obtained on beginning woodworking, drawing, and 
sheet-metal projects are given in Tables 1 to 6 inclusive. 

Results of Rating Woodwork Samples. Table 1 shows a range of 
40 to 81 for the ratings of the red wren-house, 70 to 96 for the gray 


4 INTRODUCTION 


TABLE 1 
RATINGS ASSIGNED WOODWORK SAMPLES 
(39 teachers) 


Red Wren-house Gray Wren-house Rolling Pin 


Rating Frequency Mark Rating Frequency Mark Rating Frequency Mark 


1 c 96 1 A 98 1 A 
2 3 c 95 5 А 99 : A 
78 4 D 94 1 A 95 6 Г 
75 10 р 92 3 B 94 2 "i 
74 1 D 90 11 B 93 2 

73 1 D 87 1 B 92 3 i 
70 3 F 85 12 С 90 14 

65 2 Е 80 2 @ 85 2 С 
61 2 Е 75 2 р 83 1 С 
55 1 F 70 1 D 80 1 С 
53 1 F 78 1 D 
51 1 F 75 3 D 
50 6 F 

40 3 F 


ge 


wren-house, and 75 to 98 for the rolling pin. There is also a tendency 


for the ratings to bunch at certain points on the scale. The letter 
grades show in a rough way that pupils with projects of the same basie 
quality might be assigned almost any of the variable passing marks, 
depending upon which shop teacher rated them, It must be remem- 
bered, however, that these results are not in all respects comparable 
to the actual situation in the shop or at the drawing table. In either 
of these situations the teacher almost certainly would grade on the 
class average. Furthermore, it would be necessary to take into con- 
sideration the physiological development of the pupils, their intelli- 
gence quotients, and quite likely their mechanical aptitudes. There 


would also be the factor of the pupil's personality and class attitude 
which might affect the teacher’s judgment, 


The data given in Table 2 show c 
tion in the factors mentioned in scori 
distinct preference, howey 
under the following headi 
fasteners, and utility. I 
tion in the rating of the 
emphasis placed on these f. 


Rating Drawing Sampl 
show more v 


er, for rati 


ngs: results of tool operations, finish, design, 


much of the varia- 


€ varying amounts of 
achers themselves, 


S of the drawings (Table 3) 
9 cluster than is found in the 


es. The rating; 
ariation and less tendency t 


TEACHERS' MARKS IN INDUSTRIAL EDUCATION Б. 


TABLE 2 


Mayor Facrors CONSIDERED BY JUDGES IN RATING THE THREE WOODWORKING 
Prosects 


(39 teachers) 


Rating Factors Frequency 
Finish 31 
Joints 30 
Proportion 28 
Squareness 22 
Nailing 19 
Utilitarian 13 
Commercial standard of workmanship 13 
Design 10 
Sanding 8 
Fitting 7 
Gluing 5 
Dimensions 4 А 
Choice of materials 4 
Planing 4 
Shape 4 
Accuracy 3 

TABLE 3 


RATINGS ASSIGNED THREE BEGINNING DRAWING PROJECTS 
(27 teachers) 


Sample 1 Sample 6 Sample 7 


Rating Frequency Mark Rating Frequency Mark Rating Frequency Mark 


94 1 A 88 1 B 99 1 A 
90 2 B 87 1 B 98 1 n 
85 3 С 86 1 С 97 1 А 
84 1 C 85 2 C 95 4 A 
82 1 © 82 1 [o 93 2 A 
80 7 (6 80 2 О 90 8 B 
75 4 D 78 1 D 85 6 С 
10 3 D 75 3 D 80 3 С 
65 1 Е 70 6 F 75 1 D 
60 1 F 65 3 F 
40 1 F 60 1 F 
25 1 F 55 2 F 

50 1 F 

30 1 F 


6 INTRODUCTION 


ratings of the woodworking projects. | The two poorer чае a 
(Samples 1 and 6) show the most variation and the best drawing t : 
least, although this drawing (sample number 7) has a variation equa 
to the range of all of the passing marks. | 

The frequencies given in Table 4 indicate that the drawing teach- 
ers considered the factors of lettering, figures, and lines most often, 
but they also took into consideration neatness, dimensions, erasures, 
arrowheads, and accuracy with fair consistency. On the whole, the 


TABLE 4 
Малов Factors CONSIDERED BY JUDGES IN RATING THE THREE DRAWINGS 
Rating Factors Frequency 

Lettering and figures 62 
Lines 62 
Neatness 27 
Dimensions 22 
Erasures 18 
Arrowheads 16 
Accuracy 13 
Cleanliness 8 
French curves 8 
Placement 8 
Completeness 7 
Spacing 5 
Projection 5 
Joints 4 
General appearance 


drawing teachers showed а slightly greater varia 
working teachers in their ratings. 
Rating Sheet-Metal Projects. Differences quite typical of all such 
ratings are found for the three sh 
(Table 5). Itis apparent from T. 
ing was the factor considered mo 
Summary of Ratings. 
and drawing teachers are } 
group of projects or draw 
relative import 
ratings. There are probab 


tion than the wood- 


cate that shop 
gs of the same 
ot agree on the 


TEACHERS' MARKS IN INDUSTRIAL EDUCATION 7 


TABLE 5 


RATINGS AssicNep THREE SHEET-METAL Prosecrs 
(12 teachers) 


Sample 1 Sample 2 Sample 3 


Rating Frequency Mark Rating Frequency Mark Rating Frequency Mark 


90 1 B 95 2 A 96 1 A 
87 1 B 92 1 B 90 3 B 
85 2 © 90 1 B 85 1 С 
80 1 С 82 1 © 80 4 С 
7 1 D 80 1 8 78 Í D 
75 1 D 75 5 D 75 1 D 
70 1 F 70 1 F 60 1 F 
60 1 F 
55 1 F 
50 1 F 
40 1 F 


the estimation of quality in these specimens to introduce serious errors 
ased on such a procedure. Table 7 summarizes the 


in measurement b 
and the corresponding letter marks for the nine 


range in ratings 
projects included in the study. 


TABLE 6 
Mazor Facrons CONSIDERED BY JUDGES IN RATING THE THREE SHEET-METAL 
PROJECTS 
Rating Factors Frequency 
Soldering 33 
Proportion 13 
Seams 13 
Roundness 12 
Shape 11 
Wiring 9 
Neatness 8 
Accuracy 6 
Forming 6 
Design 6 
Roughness 5 
Joints 4 
Curve 2 
2 


Crimping 


8 INTRODUCTION 


TABLE 7 


SUMMARY OF RATINGS 


Project Range of Ratings Corresponding Letter Marks 
Woodwork 41; 26: 23 C-F; A-D; A-D 
Drawing 69; 58; 24 A-F; B-F; A-D 
Sheet metal 50; 25; 38 


B-F; A-F; A-F 


3. Need for a Knowledge of Measurements in Industrial Education. 


The rating of shop projects and drawings is difficult; it requires 
a complex fusing of judgments based on a group of variable factors. 
Yet, psychologically, the rating of shop projects and drawings is little 
different from the rating of an English theme or a paper in math- 
ematics. In English the factors to be considered may be spelling, 
sentence structure, paragraphing, punctuation, ete.; and in shop sub- 
jects the judgment may be based on such 


factors as tool processes, 
design, utility, finish, and fasteners. Teachers vary greatly in their 
concept of what constitutes perfection in a project or drawing. The 


same project or drawing looks different to different individuals, and 

quite probably to the same individual under different circumstances. 
Marks assigned by teachers of sho 

are subject to the same ty 

achievement. The mı 


uccessfully applied t 
shop w 


5 accurate measures of the results of teach- 
important need on the part of teachers of 
t better, in order that they may be in a 


complish this, industri i 
1 levels of general i 


SELECTED REFERENCES 9 


tudes, informational background, interests, appreciations, and emo- 
tional traits and attitudes of their students. 


SUMMARY EXERCISES FOR DISCUSSION 


1. What specific factors appear to make objective measurement in industrial 
edueation quite difficult? 

2. Rate a project in woodworking, drawing, or metal working, on a percentage 
basis, and record the characteristics of each project which influenced you 
most in assigning the marks. 

3. If you have access to a class, have each student mark independently a project 
in each of the above fields, and compare the marks as to variability, follow- 
ing the procedure shown in Table 1. 

4. Tabulate the characteristics of each project that were mentioned by the stu- 
ents as being considered in marking the project. 

5. In your judgment, what factors largely account for the wide variation in marks 
of achievement assigned to products of a similar quality? 

6. Suggest a number of devices which would seem to have possibilities for in- 
creasing accuracy in the assignment of marks to shop projects. 


SELECTED REFERENCES 


Hunter, WILLIAM L., “Objective Tests in Shop Courses,” Industrial Education 
Magazine, Vol. 29: 433-39, No. 12, June, 1928. 

Квих, Е. J., Teachers’ Marks, Teachers College Contribution to Education, No. 
66, p. 11, Columbia University, 1914. 

Leavirr, F. M., “Standardized Measurements in the Field of Industrial Arts,” 
Industrial Arts Magazine, Vol. 8: 132, April, 1919. 

Maxsrerar, D. Е. “Testing the Industrial Arts in Junior and Senior High 
School,” Industrial Arts Magazine, Vol. 18: 49, 1929. 

Meyer, Max, “The Grading of Students,” Science (N. S.), Vol. 28: 243-252. 

Nasu, Harry B. and Van Duzer, Roy R., “The Standard Test in Industrial 
Arts," Industrial Arts Magazine, Vol. 19: 125-29, No. 4, April, 1930. 

Nawkırk, L. V, “Reliability of Shop Teachers' Marks in Rating Shop Projects 
and Drawing, Industrial Arts and Vocational Education Magazine, Vol. 
20: 123, April, 1931. 

Smiru, Homer J., “Objective Measurement in Industrial Education,” Industrial 
Education Magazine, Vol. 31: 331-336, No. 9, March, 1930. 

Srarcu, Рахів, and Exuorr, Epwarp C., School Review, Vol. 20: 442-57; 
21: 254-59; 26: 676-81. 

Sworn, Ammon, “How to Construct Objective Tests in Industrial Subjects,” 
Industrial Education Magazine, Vol. 30: 7-9, No. 1, July, 1928. 


CHAPTER II 
TYPES OF EDUCATIONAL TESTS 


4. Essay-Type Tests. 


The two general classifications of educational tests in common 
usage are objective and essay-type tests. Objective tests are so con- 
structed that they can be scored without any guessing or subjective 
judgment on the part of the user. In the traditional or essay test a 
number of questions are made out covering the material to be tested 
in a general way with statements similar to the following: 


. Name ten common cabinet woods. 
. How are the grades of sandpaper indicated? 


- What is the difference between spindle turning and face-plate turning? 
. What is varnish? 


сњ со юк 


. What is the prineple of the internal-combustion engine? 


The average teacher using the essay-type examinati 
or ten questions on the subject being tested (draw 


metal, auto mechanics) and then allows the 
minutes to answer them. 


on makes up five 
ing, woodwork, sheet: 
pupils thirty to fifty 
The directions for administering such a test 
usually consist in a statement reminding the pupils to write their 
names on each sheet before handing in the test. 

The scoring of the essay-t 
lem, some phases of which were introduced in the preceding chapter. 
The teacher’s principal object 
an estimate of the pupil’s m: ational 
content of the course. an essay-type examination, 
acher’s judgment but which have 
little to do with the actual evaluation of the student’s knowledge of 
re English, including spelling, sen- 
1 composition; mechanical features of 
the examination such as neatness, legibility, use of Pen or pencil, use of 
1 : | ize of paper used; the quan- 
tity written; the sampling of the subjects represented by the ques- 
tions; the teacher's attitude toward the pupil, or the pupil's attitude 

10 


OBJECTIVE TESTS 11 


toward the teacher. The final mark is influenced by unknown com- 
binations of these factors. This means that the mark on the test is 
an entirely inadequate expression of any one factor, and hence is an 
unreliable measure of the entire field covered by the test. Thus the 
essay test at best can furnish only the roughest measure of achieve- 
ment. 

It is frequently argued by those defending the essay-type test that 
it gives the student valuable training in the mechanics of writing, 
spelling, thought organization, and expression. If this were actually 
accomplished, the argument would be sound, but even an unbiased 
observer of students engaged in writing essay-type examinations must 
admit that the rush and strain of getting the words down on the exam- 
ination paper leaves very little opportunity for the training in thought 
organization and expression which it should give. It seems safe to 
conclude, therefore, that if а teacher desires to measure a pupil’s 
ability to spell, write, and express himself, he should use tests designed 
for that purpose and not confuse the issues. 


5. Objective Tests. 

Properly constructed objective-test exercises are not influenced ap- 
preciably by the conflicting factors which appear to invalidate meas- 
urement based on the essay-type question. Objective exercises are 
marked by two important and related features. These are (1) brevity 
of pupil response, and (2) absence of personal judgment in scoring 
the test exercises. These features of the objective exercise make it 
equally suitable for use in the teacher-made informal examination, 
and in the more carefully constructed standardized test. 

Objective exercises are stated in such forms that the pupil is able 
to indicate his understanding by the briefest and simplest of physical 
responses, usually consisting of underlining or encircling a single word 
or phrase. Because of this brevity of pupil response, many more exer- 
cises may be submitted to the pupil, thus providing a more complete 
sampling or coverage of the subject-matter. The quality of the an- 
swers need not be evaluated by the teacher but may be scored as right 
or wrong by comparison with an answer key. The use of the objective 
form of the test exercise thus makes it possible for different teachers 
to score the same test papers and secure identical results. A test exer- 
cise which is perfectly objective may be scored repeatedly at widely 
separated intervals and by different individuals without significant 
variation in results. Such accuracy in grading test exercises can be 
obtained only when the exercises are constructed in accordance with 
certain rather well-known specifications. 


12 TYPES OF EDUCATIONAL TESTS 


6. Tests and Scales. 


Measuring instruments are roughly divided into tests and scales. 
This distinction is of some value, but at times it is confusing because 
some tests resemble scales or contain certain features of scales as an 
essential part of their construction. Generally speaking, a test is a 
measuring instrument used for the evaluation of any knowledge, qual- 
ity, or ability. It may measure degree of achievement, mental ability, 
aptitude, or character traits, It may be made up of items of uniform 
difficulty, or it may be composed of a series of items of uniformly 
increasing difficulty or value. In the former case it is a rate test; in 
the latter, it is a power test. The process of determining the diffi- 
culty or value of test items is called Scaling. 'The use of this term 


possibly accounts for much of the confusion concerning tests and 
scales. 


A measuring instrument is a scale 
complishment directly in terms of Sys 
Ап instrument whicl 


to the extent that it ranks ac- 


tematic levels, grades, or ages. 
h is made up of scaled items (items of system- 


quite often by the s 
For example 
gradually increasing difficulty cou 


creasing percenta 
might be treated as a scale if the scores on it were expressed in terms 
of the scale value of the last wo 


а test. 
у hybrids гези] 


ting from the cross- 
"That is, they 


en on the scale most 
the use of such quality 


amount of subjectivity into the meas- 


T 


^ 


CLASSIFICATION OF EDUCATIONAL TESTS 13 


urement, since the teacher's judgment is necessarily involved in assign- 
ing the quality rating. Such scales are used for the rating of hand- 
writing, free-hand lettering, drawing, electrical splicing, soldering, 
wood-boring, riveting, forging, finishing, and many other products. 
The techniques used in the construction and use of these scales are 
discussed and illustrated in Chapter XII. 

7. Standardized and Informal Objective Tests. 

Objective measuring instruments are further designated as stand- 
ardized tests and teacher-made tests and scales. Both types are useful 
in measuring achievement in industrial education. A test is standard- 
ized (1) if it is composed of exercises that have been selected in the 
light of usual teaching practice and evaluated as to innate difficulty, 
and (2) if it is accompanied by norms or standards permitting the 
interpretation of results in levels of accomplishment. Standardized 
tests are of value in comparing the accomplishment of a class with 
general standards and in comparing groups in different schools in the 
same system. Teacher-made or informal objective tests are similar to 
standardized tests except that the test items are selected directly from 
the content of the course of study. Usually the items in such tests 
more closely parallel the material taught but are less carefully formu- 
lated and evaluated than standardized tests. Generally, too, no 
norms are available, but useful levels of accomplishment may be 
developed from year to year by recording the scores each time the 
test is given. Teacher-made objective tests are extremely useful in 
measuring achievement and diagnosing instruction in the shop. 

From the standpoint of their administration, standardized and 
teacher-made tests may be classified as written, oral, and performance. 
Psychologically there is little difference, because, after all, they are 
all performance tests. Tt has been found advantageous in testing dif- 
ferent types of industrial education achievement to have the pupils £ 
write the responses to some items, respond orally to others, and in Р 
тапу cases express their knowledge by modifying material through 
the use of tools and machines. 'The fundamental thing to note here is 
that in order to measure scientifically it is necessary to secure a re- 


sponse which can be rated objectively and compared with the same _ 


response made by others. 

8. Classification of Educational Tests. 
Educational measuring devices may be classified according to their 

use and characteristics into achievement, diagnostic, prognostie, and 

intelligence tests. Each of these types of instruments is useful in 


measuring class and individual accomplishments, and in revealing {һе 
ж 


í 


< 


14 TYPES OF EDUCATIONAL TESTS 


general and special capacities of students in the industrial education 
subjects. 

Achievement Tests. Achievement tests measure abilities or prod- 
ucts acquired from the school or other types of educational experience 
ofthe pupil. Such tests may be standardized, or they may be informal 
examinations made by the teacher. Considerable attention is given in 
this book to the problems arising out of the construction, use, and 
evaluation of achievement tests. 

Diagnostic Tests. Diagnosis is really one of the major underlying 
purposes of all achievement testing. In fact, it may be said that. all 
general achievement tests are diagnostic to a degree. Most achieve- 
ment tests, however, fail to furnish adequate diagnostic information 
because of the large number of skills they cover and because of the 
difficulty of securing a sufficiently detailed interpretation of the results. 
Diagnostic tests are specially constructed achievement tests designed 
to discover the exact identity and location of the pupils' strengths and 
weaknesses in subject-matter mastery. The development and use of 
such tests mean, of course, that the subject-matter itself has been 
analyzed to the point that the basie or underlying skills are clearly 
identified. It is fairly safe to assume that subject-matter fields in 
which detailed diagnostie tests are not available have not yet been 
subjected to this type of analysis. 

Tests of this diagnostic or analytical character, were they avail- 
able, would be most useful to industrial education teachers in discov- 
ering what is already known by the pupil and thus indirectly in find- 
ing what remains to be mastered. This is really an inventory use of 
the tests. Genuine diagnostic tests have been slow to appear in indus- 
trial education subjects. The Newkirk-Stoddard Home Mechanics 
Test* (see page 115 for extracts from this test), though not strictly 
diagnostic, furnishes a useful analysis of instruction in home me- 
chanics. It may be used also to determine how well a school is teach- 
ing the outstanding home mechanics jobs or what jobs the individual 
pupils are best acquainted with. Hunters Shop Tests? are further 
ilustrations of tests with some diagnostie value. For example, in 
this series of measures on woodwork there are tests on tools, fasten- 
ings, trade names, reading rules, wood finishing, and others. 

Prognostic Tests. One of the very significant features of modern 
measurement in education is its emphasis on prediction. Tests of gen- 


1 Newkirk, L. V., and Stoddard, George D., Newkirk-Stoddard Home Mechan- 
ics Test, Bureau of Educational Research and Service, State University of Iowa, 
Iowa City, Iowa, 1928. 


? Hunter, Wm. H., Shop Tests, The Manual Arts Press, Peoria, Illinois, 


SUMMARY 15 


cral mental ability are useful to the extent that they predict a pupil's 
general level of accomplishment. Prognostic tests are measures of 
specialized aspects of intelligence. The purpose of such tests is to 
provide the basis for accurate prediction of future achievement in 
specialized fields on the basis of present performance on some funda- 
mental underlying elements of the subject. Prognostic tests are de- 
signed to measure specific abilities underlying achievement in a par- 
ticular subject-matter field rather than the achievement itself. 
Aptitude or prognostic tests in industrial education should be most 
useful in determining the probability of success of a student in such 
subjects as drawing, machine shop, carpentry, bricklaying, cabinet 
making, or in any other special field. Tests of mechanical ability have 
value in predicting probable future success in industrial education 
subjects. 


Intelligence Tests. There are many definitions of intelligence, and 


many different ways and means of measuring it. In general, intelli- 
gence is the capacity of the individual to adapt himself to novel situa- 
tions. It is the power of the individual to learn. In actual practice, 
intelligence is usually measured in terms of the extent to which the 
individual has applied this power in the acquisition of information and 
skills in a number of specific and mainly unrelated fields. In a sense, 
general mental ability is like a cable composed of many strands and 
fibers of varying size and quality, each representing some particular 
phase of ability. The intelligence test is merely a device for taking 
a cross-section of this cable. If the measuring device reveals the large 
and important strands of the table it is a valid instrument. 


SUMMARY 


The two gencral classifications of educational tests in common 
usage are objective and essay-type tests. The objective test may be 
scored without the subjective judgment of the teacher. The grading 
of the essay type of examination presents a real problem and is influ- 
lgment of the teacher. A test exercise 


enced by the subjective jud 
which is perfectly objective may be scored repeatedly at widely sep- 


arated intervals without significant variation in results. 

Measuring instruments are roughly divided into tests and scales. 
А test is а measuring instrument used for the evaluation of any knowl- 
edge, quality, or ability. A scale is a measuring instrument that ranks 
accomplishment directly in terms of systematic levels, grades, or ages. 
Objective measuring instruments are further designated as standard- 
ized and teacher-made tests and scales. A test is standardized when 
it is composed of exercises that have been selected in the light of usual 


16 TYPES OF EDUCATIONAL TESTS 


teaching practice, evaluated as to innate difficulty, and is accompa- 
nied by norms or standards permitting the interpretation of results in 
levels of accomplishment. 

Achievement tests measure abilities or products acquired from the 
school or other types of educational experience of the pupil. Diag- 
nostic tests are specially constructed achievement tests designed to 
discover the exact identity and location of the pupil's strengths and 
weaknesses in subject-matter mastery. Prognostic tests may be 
thought of as measures of specialized aspects of intelligence. Intelli- 
gence or general mental ability may be described as the power the 
individual has to adapt himself to novel situations. Intelligence tests 
are classified as group and individual, depending on the method of 
administration they employ. 


SUMMARY EXERCISES FOR DISCUSSION 


m 


. What special features distinguish the objective test from the essay-type test? 

2. Enumerate as many as possible of the special factors which distinguish stand- 
ardized tests from informal objective tests. 

3. What does the process of standardization of a test imply? 

4. Illustrate the different types of educational tests, using materials from the 
industrial arts field. 

5. In what specific ways are prognostic tests different from tests of general 
mental ability? 

6. What distinguishes a test from a scale? 

7. What qualities distinguish a rate test from a power test? 

8. Suggest specific ways in which quality scales may be particularly useful in 
industrial arts classes. 

9. What types of measuring instruments seem to have the greatest possibilities 
of practical value in industrial education? 

10. Do you think it will ever be possible to develop genuinely diagnostic tests 

in this field of instruction? Why? 


SELECTED REFERENCES 


Greene, H. A., and JORGENSEN, А. N., The Use and Interpretation of Elementary 
School Tests. New York: Longmans, Green and Company, 1935. 

KELLEY, Т. L, Interpretation о] Educational Measurements. Yonkers, New 
York: World Book Company, 1927. 

LANG, А. R., Modern Methods in Written Examinations. Boston: Houghton 
Mifflin Company, 1930. 

Monro, W. S., The Theory of Educational Measurements. Boston: Houghton 
Mifflin Company, 1923. 

Орви, C. W., Educational Measurement in High School. New York: The Cen- 
tury Company, 1930. 

Орел, C. W., Traditional Examinations and New-Type Tests. New York: The 
Century Company, 1928. 


SELECTED REFERENCES 17 


Ruca, G. M, The Objective, or New-Type Examination. Chicago: Scott, Fores- 
man, and Company, 1929. 

Rucu, С. M, and Sropparp, С. D., Tests and Measurements in High School 
Instruction. Yonkers, New York: World Book Company, 1927. 

Ѕмітн, H. L., and WRIGHT, W. W. Tests and Measurements. New York: Silver, 
Burdett and Company, 1928. 

Symonps, P. M., Measurement in 
millan Company, 1927. 

WiLsox, G. M. and Hore, К. J. Hou 
Macmillan Company, 1928. 

Woon, Ben D, Measurement in 
Book Company, 1923. 


Secondary Education. New York: The Mac- 
v to Measure (Revised). New York: The 


Higher Education. Yonkers, New York: World 


+? 


CHAPTER III 
USES OF TESTS IN CLASSROOM AND SHOP 


9. Tests as Related to Instruction. 


The uses of educational tests for administrative, supervisory, re- 
search, and survey purposes, important as they are, do not represent 
their most vital and important functions. In the past so much em- 
phasis has been given to these particular uses that the teacher often 
lost sight of their real utility in the solution of his individual instruc- 
tional problems. The recent development of reliable, valid, and highly 
detailed measuring instruments designed to parallel closely the subject- 
matter content taught by the teacher has caused him to shift his 
point of view. He realizes now that tests are most important supple- 
ments to other instructional material, and that without them he can 
scarcely hope to work at his highest level of instructional efficiency. 
He notes that modern tests are usually well made, detailed, compre- 
hensive, and analytieal. He sees that with this type of instrument 
available it is possible for him to test as he teaches; to chart his 
instructional course from accurate and objective observations. Mod- 
ern tests give the teacher a chance to discover where emphasis should 
be placed, and to determine when a satisfactory level of control has 
been attained. 

The busy classroom teacher can hardly be expected to construct 
tests which will possess all the merits of a carefully constructed stand- 
ard test. This assumes a breadth of knowledge of the subject-matter 
and a training in the technique of test construction which most teach- 
ers do not have. Even if there were perfect subject-matter mastery 
and a thorough knowledge of the making of tests, it is doubtful if the 
typical classroom teacher should be expected to spend his time in this 
way when, in most fields, other, better-made, and more economical 
materials are available. Yet, the teacher should certainly not be 
forced to depend upon his general observation of his pupils for his 
information concerning their strengths and weaknesses. 

There are times and conditions, however, in which the informal 
objective or teacher-made test is very useful. The teacher-made ob- 


jective-type shop test serves its most useful function in measuring the 
18 


MEASUREMENT OF CLASS AND PUPIL ACHIEVEMENT 19 


relative achievement of the individual members of a class. Quite often 
standardized tests contain items that are not taught in the course. 
Frequently they do not contain items which are taught. Teacher- 
made tests can be designed to fit the specific needs of the shop teach- 
ers own course. Teacher-made achievement tests can be used for 
diagnosis of special difficulties, for the motivation of learning, for 
the measurement of accomplishment and assigning shop marks; but 
since they do not have norms, the results obtained cannot be com- 
pared with data from other schools. However, the industrial educa- 


tion teacher can study his success as à teacher by comparing results 


from semester to semester and from year to year. 


10. Specific Uses of Tests. 
The specifie applications of tests to 
cussed under the following general topics: 


ment of class and pupil achievement. 
ards and norms of performance. 


industrial education are dis- 


The measure 
The establishment of stand 
The motivation of learning. 
The determination of efficienc 
The placement and guidance of pupils. 

The evaluation of teaching materials and methods. 


y of instruction. 


Pe е & be 


This broad scope of usefulness indicates that measurement is a funda- 
mental factor in teaching industrial education subjects and that it is 
largely through the application of measurement to these subjects that 


adequate teaching methods and materials will be developed. 
ass and Pupil Achievement. 


The very principles lying back of the construction of educational 
tests almost guarantee their usefulness to the classroom teacher in 
evaluating individual pupil and class accomplishment. The selection 
of the test items to cover the basie portions of the course of study 
which is or should be taught provides one basis for comparison. The 
form in which the test exercises are stated eliminates the personal 
equation of the individual teacher. The length of most of our ap- 
proved standard tests guarantees consisteney in the results of their 
use, 'The existence of norms and standards gives definite meaning 


to their scores. It is, therefore, relatively simple matter for the 
classroom teacher to secure an accurate measure of the accomplish- 


ment of his class. 
The use of a standard tes 
possible the direct comparison © 


11. Measurement of Cl 


t in almost any selected subject makes 
f the individual pupils in the class on 


20 USES OF TESTS IN CLASSROOM AND SHOP 


an objective basis. "The simple procedure of determining the average 
of the scores made by the class permits a direct comparison of this 
particular class with other, comparable classes in the building or 
system. Another very useful type of comparison is one which is fre- 
quently made between achievement at the beginning and at the end of 
a period of instruction. Each particular type of comparison serves 
its own purpose of assisting the teacher in determining the relative at- 
tainment and progress of his class. 


12. Establishment of Goals of Attainment. 


The fact that a test has been put through the process known as 
standardization gives it a distinct value in the classroom which an 
informal test does not have. The establishment of standards or norms 
for a test sets up in an objective way the goals to be attained in the 
course. The determination of whether or not a wood joint, a solder 
joint, a rope splice, a wire connection, a hem, a type of stitch, a draw- 
ing, a sample of freehand lettering is acceptable is not necessarily à 
matter of individual teacher judgment. In many of these fields ob- 
jective standards on tests and scales establish these levels of attain- 
ment. и 

The comparison of results obtained from Shop or laboratory proj- 
ects with norms and standards for tests and scales gives the teacher an 
accurate indication of the achievement of his class in relation to other 
classes at the same experience or grade level. Experience shows that 
it is very helpful for an industrial education teacher to be able to 
evaluate his teaching success in terms of other teachers’ accomplish- 
ments. If the results are consistently lower and the test items cover 
the course of study in a suitable manner it indicates the need for im- 
proved methods on the part of the teacher, 
aptitude on the part of the pupils. 

An example of test norms in the industrial education field is given 
in Table 8, which shows data on norms for the Nash-Van Duzee Wood- 
work Test. The norms (medians) show the aver 
scores of pupils on this test according to the number 
struction they have had. Any teacher capable of gi 
compare the median of his class scores with the general average over 
the United States. In a large city the average accomplishment of dif- 
ferent classes in the same school system may be compared in a similar 
way. Table 8 gives norms from the Nash-Van Duzee Woodwork 
Test I, Scale B+ This table shows the norms on the basis of semesters 


or else reveals very low 


age achievement 
of minutes of in- 
ving this test can 


1 Nash, Harry B., and Van Duzee, Roy R., Woodw 


\ ork Tests, The Bruce Pub- 
lishing Company, Milwaukee, Wisconsin. F 


M ot, 


i accomplish much in the 


MOTIVATION OF STUDENT LEARNING 21 


of work and amount of instruction in minutes. Thus the teacher will 
be able to compare the median achievement of his class with the 
achievement of other similar classes that have had the same amount 
of instructional time or that have been in the course the same number 
of semesters. 

TABLE 8 


SHOWING MEDIAN Score NORMS, BASED on A NON-TIME AND A TIME SITUATION 
Junior High School 


Third | Fourth | Fifth Sixth | Seventh 


First | Second 
Semes- | Semes- | Semes- | Semes- 


Semes- | Semes- | Semes- 


ter ter ter ter ter ter ter 
End of sem-| 1400 2400 3400 4600 6000 9000 | 17,000 
ester work |minutes | minutes | minutes | minutes | minutes | minutes | minutes 
Non-time 
median зсоге 36 45 55 60 66 75 83 
Time me- 
dian score 44 47 53 58 64 vale 80 


Senior High School 


Eighth Semester Ninth Semester and up Possible Score 
a a БАВИИИЕ 
25,000 minutes 32,000 minutes 
90 105 184 
86 101 15 


13. Motivation of Student Learning. 

Tests and examinations have long been recognized by teachers as 
useful motivation devices. Many teachers have not realized, however, 
that the extent of this utility depends upon the character of the tests 
themselves, If the test is so constructed that it permits superficial 
thinking and shallow answers it stimulates precisely that type of work. 
If it calls for critical thinking, exact results, concise statements, care- 
ful evaluation of facts, then the force of the motivation is in the right 
direction, The use of even a moderately good test or examination may 
way of stimulating proper habits of work on 
the part of the pupils. Sometimes even the mere administration of 
the test, or the knowledge on the part of the pupils that it is to be 
iven ffect. The greatest good, however, comes from 


given, has a desirable e 
the "s of a carefully standardized test or scale, followed by the exact 


22 USES OF TESTS IN CLASSROOM AND SHOP 


location of individual pupil weaknesses and the application of cor- 
rective measures immediately after their discovery. The best experi- 
mental evidence shows that significant gains in pupil accomplishment 
accompany the sane use of properly constructed tests in such а way 
that the pupil himself is aware of his accomplishments and limitations. 


14. Determination of Efficiency of Instruction. 


These comparisons are interesting and often valuable as general 
guides, but if pupils are making low scores it is much more important 
to know where the scores are low. Is it in lettering, lack of textbook 
knowledge, poor technique, wrong type of instruction sheets, dull tools, 
or lack of interest? Just what are the conditions which cause the 
class to be lower than it should be on a standardized test? By a care- 
ful analysis of results it is often possible to determine weak points in 
the achievement of the class. A chart with the numbers of the test 
items in the standardized test on the left-hand side and the number of 
pupils getting the item correct on the right side is very useful for this 
purpose. Table 9 gives an example of this type of instructional anal- 
ysis from a class of twenty eighth-grade boys as tested by Form B 
of the Newkirk-Stoddard Home Mechanics Test? 'The test was given 
at the end of one semester of instruction. An examination of Table 9 
shows that items 1, 6, 8, 12, 16, 20, 23, 24, 26, 27, and 34 are low in 
the numbers of pupils responding correctly. This analysis indicates 
that the jobs which correspond to these numbers probably were not 
taught effectively or at least were not properly mastered by the class. 
As a matter of fact, in this particular case, the main reasons for 
the ineffective teaching were lack of supplies for teaching the jobs 
properly, poor demonstrations, no Supplementary references, and 8 
lack of instructional time on the part of the teacher due to an unduly 
heavy teaching load in other branches. 

15. Class Diagnosis. 


Standardized tests of achievement are of value to industrial educa- 
tion teachers in determining the difficulties and abilities of the various 
members of the class. Itis generally known that the background and 
abilities of the individual members of any class may vary widely. If 
the shop teacher is able to secure an accurate picture of the informa- 
tion and skills that the pupils already have when they enter the class, 
it will be of great value in placing the emphasis so the greatest in- 
structional efficiency will result from the time allotted. This type of 


. ? Newkirk, L. V., and Stoddard, George D., Newkirk-Stoddard Home Mechan- 
ics Test, Bureau of Educational Research and Service, Iowa City, Iowa, 1928. 


CLASS DIAGNOSIS 28 


TABLE 9 


ANALYSIS or CLASS INSTRUCTIONAL WEAKNESS IN Home MECHANICS 
Newkirk-Stoddard Home Mechanics Test, Form B 


Test Items Number Correct Responses 
1 6 
2 18 
3 20 
4 18 
5 15 
6 3 
7 17 
8 2 
9 20 

10 18 
11 15 
12 6 
13 15 
14 14 
15 15 
16 5 
17 3 
18 20 
19 18 
20 E 
21 
м 17 
23 ч 
24 
25 в 
26 1 
A 14 
29 18 
а 17 
0 15 
31 14 
32 16 
33 2 
34 13 
35 11 


а 
ndustrial education teachers who 


to i 
— aluable in all classes on the 


Information is especially inc 
s, but it is У 


are teaching advanced classe 
Secondary level. 

Even in small classes it is 09У 
the number of jobs that the pup? 


t there is wide variability in 


rious tha 
n y know how to do and also 


Is alread 


24 USES OF TESTS IN CLASSROOM AND SHOP 


wide variability in the specific jobs. Table 10 shows this very clearly 
by data obtained by giving the Newkirk-Stoddard Home Mechanics 
Test to a class of nine eighth-grade boys at the University of Iowa 
High School to determine which items in home mechanics they already 
knew and to see where to put the instructional emphasis for each 
pupil. 
'TABLE 10 
NUMBER or Items IN Newxirk-Stopparv Home Mecuanics Test Eacu PuriL 
ANSWERED CORRECTLY 3 


Pupil Items 
Form A Form B 
L 12; 15 1 
ig 1; 2; 4; 5; 14 2; 5; 14; 17; 32; 36 
P 1; 2; 4; 5; 9; 10; 15; 30 1; 2; 4; 5; 10; 12; 13; 14; 16; 29; 
34 
S 1; 6; 4; 5; 22 13:43 15; 10; 28 
D 2; 18; 33; 34; 35; 36 2; 4; 5; 15; 33; 34; 35 
H 1; 2; 4; 5; 9; 12; 185; 23 1; 2; 6; 8; 10; 12; 14; 15; 33; 54 
Z 1; 2; 4; 15; 36 4; 6; 12; 34 
WwW 1; 2; 4; 5; 6; 8; 18; 30; 32; 34 2; 4; 13; 14; 16; 17; 26; 28; 29; 
33 
M 2; 6; 7; 8; 9; 12; 17; 26; 29 2; 3; 4; 6; 8; 9; 10; 11; 13; 16; 
18; 19; 21; 23; 28; 29; 33; 34; 
36 


Out of the seventy-two items in the two forms of the test, only 
twenty-three were not answered correctly by some of the pupils, The 
highest score that any one received was twenty-eight jobs right 
(pupil M). The results are very valuable from the standpoint of in- 
structional efficiency because this pupil will not have to spend time 
repeating material with which he is familiar. In the case of pupil 1, 
who scored only three right, the test has identified an individual who 
needs careful attention. To the pupil himself it is a clear indication 
of the need for more instruction. 

Information of this type is valuable not only for increasing in- 
structional efficiency, but also for motivating the pupils on their proper 
level of accomplishment. Unless the teacher has previous knowledge 
that pupil M knows how to do twenty-eight of the jobs specified he is 
quite likely to waste his own and the pupil’s time through useless repe- 
tition. This usually results in building up bad habits of work on the 


3 Newkirk, L. V., Validating and Testing Home Mechanics, University of 


Towa Study in Education, University of Towa, Iowa City, Towa, Series 201, 1931, 
pp. 30-31, 


INDIVIDUAL PUPIL DIAGNOSIS 25 


part of the pupil. Furthermore, the teacher might. assume that pupil 
M did not know how to do any of the tasks in the test when as a 
matter of fact he knows much of what he has to learn. For example, 
pupil L who knows three jobs might learn twenty more and his score 
would then be twenty-three, and pupil M who knows twenty-eight 
might learn ten more and have a score of thirty-eight. The boy who 
had learned twenty would have accomplished more, but the final score 
would not indicate that he had accomplished twice as much. In fact, 
it would give the impression that he had accomplished fifteen less. 
This merely illustrates the need for giving industrial education tests 
at the beginning of a course to discover what is already known, during 
the semester for indications of progress and for motivation, and at the 
close of the semester to measure accomplishment and growth. 


16. Individual Pupil Diagnosis. 

Closely related to the measurement of class and individual levels 
of accomplishment is the diagnosis of individual learning difficulties of 
certain pupils in the class. Just as in the other instructional fields, the 
teacher may assume that these pupils are naturally slow or do not try. 
It frequently occurs, however, that upon closer examination these 
slower pupils have many learning difficulties which can be corrected by 
the application of proper remedial teaching. The possible causes of 
these difficulties are numerous; they may be one or more of the fol- 
lowing: malnutrition, defective eyesight, difficulty in hearing, poor 
reading ability, poor technique in manipulation of some or all tools, 
inability to adjust tools, inability to read a working drawing, ignor- 
ance of sizes of tools, unfamiliarity with related mathematics, low 
mechanical ability, emotional maladjustment, social maladjustment, 
and low intelligence. These difficulties are usually obvious in extreme 
cases, but the majority of the pupils in the class may have one or more 
of the difficulties which will seriously affect his ability to profit from 
the instruction. It is on this account that the industrial education 
teacher needs as much professional information about his pupils as it 
is possible to secure in order better to adapt his instruction to the indi- 
vidual differences and abilities of his pupils. 

The efficient shop teacher must know how to test many factors 
other than those that relate directly to achievement in industrial edu- 
cation. Fundamentally he is a teacher of individuals and not a 
teacher of drawing, woodwork, metal work, electricity, printing, or 
auto mechanics. Table 11 illustrates types of information which in- 
dustrial education teachers will find useful in their teaching and 


guidance activities. 


26 


USES OF TESTS IN CLASSROOM AND SHOP 


TABLE 11 
DESIRABLE Types or PROFESSIONAL INFORMATION 
Name......John J....... Grade...... 1 

Test Score 
Intelligence test 112 
Reading 40 
Language 20 
Spelling 55 
Writing (quality) 60 
Mathematics 75 
Mechanical aptitude 120 


Hypothetical grade norms for the test scores in Table 11 
as follows: 


are given 


Intelligence Math- Mechanical 
Grade Test Reading Language Spelling Writing ematies Ability 
7 90 40 20 40 60 30 70 
8 95 45 32 45 65 40 80 
9 100 55 45 50 72 50 100 
10 110 60 50 a 80 75 120 
11 122 65 60 75 86 80 135 
12 136 75 65 82 90 84 150 


The information in Table 11 gives the achieve 


a tenth-grade pupil on a number of tests, and the 
norms indicate what the pupil’s level of 
levels of accomplishment are indic 


Fic. 2— 


TEST 
Intelligence 


10.2 
Reading 
Language 
Spelling 
Writing 
Mathematics 10.0 


Mechanical Ability 10.0 


8 9 10 11 
Grade Level 


7 


Graph of Achievement According to Grade Level. 


ments obtained by 
hypothetical grade 
achievement should be. The 
ated graphically in Fig. 2. 


GRADATION AND GUIDANCE 27 


The profile chart shows that John J. has intelligence slightly better 

than а tenth-grade pupil, the reading and language ability of а sev- 
enth-grade pupil, spelling ability a little above that of a ninth-grade 
pupil, writing quality equal to that of a seventh-grade pupil, mathe- 
matical and mechanical ability equal to that of a tenth-grade pupil. 
Assume further that this pupil is in the tenth grade in electrical shop 
and is making slow progress. The teacher in the electric shop is 
using instruction sheets and related reference materials as supple- 
mentary teaching devices. The students are required to do consid- 
erable reading and to write out the answers to the questions on the 
individual instruction sheets. By studying the chart in F ig. 2 it is 
obvious why this pupil has difficulty in making satisfactory progress. 
He cannot read well and is a poor writer, although he has intelligence 
and mathematical and mechanical ability adequate for doing good 
work in the course. The remedy here is special instruction in reading 
and language with additional emphasis on writing legibly. 

All the illustrations used here have been on the basis of grade 
norms because they are easy to compute and illustrate the different 
levels of accomplishment. However, many industrial education teach- — 
ers may wish to classify pupils on the basis of ability to learn, and 
reveal progress by using mental ages and achievement ages of the . 
pupils in their classes. The various types of norms are discussed in "ә 


Chapter ХУ. 


\ ^ 

L 

17. Gradation and Guidance. ў » 
Tests of intelligence and mechanical and special aptitudes are of | 
isors and teachers of industrial education in classifying 
pupils with approximately equal learning power. In a large school 
System where there are several sections of a class, it is usually con- 
Sidered desirable for instruetional purposes to classify pupils into 
groups of about equal learning abilities. It is easier to meet the indi- 
vidual learning difficulties of a group of pupils if they have about the 
same general intelligence. In the small school it may not be possible 
to divide the pupils into instructional groups of approximately equal 
learning ability, but usually the classes are small and the teacher has 
more time for individual instruction. The industrial education teacher 
may divide classes on the basis of either mechanical ability or intel- 
ligence. Both these factors are important in shop instruction, but they 
do not correlate highly. Some pupils have low mechanical ability and 
high intelligence, others high mechanical ability and low intelligence, 
as indicated by intelligence tests. Scores on intelligence tests and 
y tests usually have a positive correlation 


value to superv 


Scores on mechanical-abilit 


ومد 


28 USES OF TESTS 


of between .20 and .30. Of 


IN CLASSROOM AND SHOP 


course, the large majority of pupils have 


about average mechanical ability and average intelligence. If an in- 
dustrial education class is selected on the basis of scores on intelligence 
tests the result will be a class with similar intelligence ratings but with 
variable mechanical aptitude. If selected on the basis of mechanical 
aptitude the intelligence ratings will be variable. 


In the first two years of 
about tools, materials, and 
than acquiring outstanding 


the junior high school where information 
industries is considered more important 
tool skill, it seems desirable to section 


classes on the basis of intelligence scores, because of the nature of the 
learning problems. In advanced courses where trade training and the 


acquiring of trade skill are 


the dominant objectives, it probably is 


better to classify pupils on the basis of mechanical aptitude, since that 
is of vital importance in acquiring outstanding skill in manipulating 


tools and materials. In eit 
both ratings for use in ada 
culties of the pupils. 


her case, it would be desirable to have 
pting instruction to the individual diffi- 


Tests of mechanical aptitude or mental ability are usually admin- 
istered by the supervisory officers in the school or by persons espe- 


meaning. The scores from the best educ 
reliable for individual diagnosi 
trends or levels of accomplishme 


which are very high or very 


combined scores of several similar tests ¢ 


tainty in diagnosis of pupil 
very high or very low Score. 


on carefully prepared educational tests a 
teacher’s subjective Judgment, but they ar 
considered final and used dogmatically. 
Teachers of industrial education should be very c 
fuse the purposes of aptitude and special-ability t 


ational measures are not so 
S as they are for indicating general 
nt. It has been found that test scores 
low are most likely to be in error, The 
ап be used with more cer- 
difficulties than any one score. Before 


Te more accurate than the 
€ not accurate enough to be 


areful not to con- 
ests with achieve- 


SUMMARY 29 


ment tests. A pupil who has a high score on tests of intelligence and 
mechanical ability, other factors being equal, should do good work in 
industrial education courses. The fact that a pupil has these abilities 
does not necessarily mean that he should receive a high mark. Re- 
gardless of a pupil’s abilities, he should be marked on the basis of 
actual achievement in the course taken. Standardized and teacher- 
made tests should be utilized for measuring achievement and the re- 
sults used as a major factor in assigning shop marks. Tests of spe- 
cial abilities are valuable in guidance, in classification of pupils, and 
in pointing out individual pupil difficulties. They are not of particu- 
lar value in measuring the amount of information or skill acquired in 


industrial education courses. 


18. Tests in Research. 

One of the obligations of a teacher to his profession is to discover 
new truths which can be applied for the improvement of work in his 
chosen field of endeavor. Carefully constructed educational tests can 
be used to discover new and better ways of organizing and teaching 
industrial education. It does not seem likely that a scientific method 
Of instruction can be developed in any instructional field without 
suitable measures of achievement and abilities. The following. are 
examples of a few of the problems which could be solved in part | 


through the use of adequate tests. | 
1. What are the relative values of different teaching methods for industrial 
education subjects (use of demonstrations, instruction sheets, class in- 
struction, individual instruction) ? . i um 

What type of shop organization is most effective (composite, unit)? 

How much instructional time should be given to lecture, demonstration, 


and individual instruction? . | 
4. What types of individual instruction sheets are’ most effective at different 


euis 


grade levels? а я 3 
5. What is the proper size of a class in drawing, sheet metal, machine shop, 


foundry, woodwork, auto mechanics, printing, and the general shop? 
6. What is the most economical length of period to be used in industrial 


education instruction? А . p y RS с 
7. What is the most effective classification of instructional materials in indus- 


trial education courses on the basis of grade accomplishment? 


SUMMARY 


Educational measurements have the following general uses in in- 
dustrial education: to measure class and pupil achievement, to estab- 
lish standards of performance, to motivate learning, to diagnose pupil 
learning difficulties, to mark and promote pupils, to classify pupils ac- 
cording to abilities, and to study the effectiveness of teaching methods. 


30 USES OF TESTS IN CLASSROOM AND SHOP 


Educational measurement is a fundamental factor in teaching indus- 
trial education. Standardized and teacher-made tests are valuable in 
measuring achievement. Aptitude tests are valuable in guidance and 
diagnosing individual difficulties. Teachers need a great deal of pro- 
fessional information about their pupils other than measures of 
achievement if their courses of instruction are to be effectively adapted 
to individual needs of their pupils. 


SUMMARY EXERCISES FOR DISCUSSION 


1. List the major factors which would make it difficult for the classroom teacher 
to construct tests which will have the merits of carefully constructed stand- 
ardized tests. 

2. Enumerate and illustrate the six main uses of tests in industrial education. 

3. Show how tests of intelligence and special aptitudes may be used for grada- 
tion and guidance purposes. 

4. What is the teacher's responsibility for the use and interpretation of standard 
tests and scales in the classroom and shop? 


SELECTED REFERENCES 


GREENE, Н. A, and JORGENSEN, A. N., The Use and Interpretation of Elementary 
School Tests. New York: Longmans, Green and Company, 1935. 

Brewer, Joun M., Cases in the Administration of Guidance. New York: Me- 
Graw-Hill Book Company, 1929. 

Мохноє, W. S., The Theory of Educational Measurements. 
Mifflin Company, 1923. 

NEWKIRK, L. V., and Sropparp, Grorce D., The General Shop. Peoria, Illinois: 
The Manual Arts Press, 1929. 

Орки, C. W., Educational Measurement in High School. New York: The Cen- 
tury Company, 1930. 

Rucu, G. M., and Sropparp, Georce D., Tests and Measurements in High School 
Instruction. Yonkers, New York: World Book Company, 1927. 

Smitu, Н. L., and Wricut, W. W., Tests and Measurements, New York: Silver, 
Burdett and Company, 1928 

Symonps, P. M., Measurement in Secondary Education. New York: The Mac- 
millan Company, 1927. 


WILSON, G. M., and Hore, К. J., How to Measure (Revised). New York: The 
Maemillan Company, 1928. 


Boston: Houghton 


CHAPTER IV 
SELECTION AND EVALUATION OF TESTS 


I. CRITERIA FOR INDUSTRIAL EDUCATION TESTS 


Several characteristies of a good test should be considered by the 
shop teacher in evaluating published tests or tests of his own construc- 
tion. The most important of these are validity, reliability, objectivity, 
adequate norms, the existence of duplicate and equivalent forms, ease 
of administration, and economy. Ап understanding of these factors 
will do much to insure the selection or construction of a test suitable 


for the testing problem at hand. 


19. Validity. 

The general concept of validity in a test may be made clear by 
thinking of the conditions set up by the test as a small sampling 
of a larger life situation. At the outset it is assumed that the field 
which the test samples is of some real importance. If this is the case, 
then the more nearly the conditions set up in the test itself duplicate 
the larger situation as found in life the more valid it becomes. For 
example, it would doubtless be possible to prepare a laboratory test 
designed to measure one’s ability to handle an automobile in heavy 
city traffic and one’s reactions to the situations encountered there. 
It would be much more practical (valid) to bring the subject into 


direct contact with a bit of heavy traffic and determine exactly how 


he does react to it. 29 
From the point of view of the classroom teacher, validity usually 


is concerned with the question of whether the materials tested are 
actually of real significance, and whether the pupil has had any ade- 
quate opportunity to master the facts tested as a result of his contact 
with the course of study taught. Validity may be defined as some 
type of objective expression of the degree to which the particular 
measuring instrument measures what tt is supposed to measure. That 
is to say, a test which is designed to measure ability to read blueprints 
after a short period of training, and later is found to be a better test 
of general intelligence, would be considered to be lacking in validity 
for the purpose for which it was designed. Validity is usually ex- 


pressed in terms of the correspondence of results obtained from the 
31 


32 SELECTION AND EVALUATION OF TESTS 


particular measuring device under consideration and other, similar 
instruments of previously determined validity. Very often it is im- 
possible to secure measures from other instruments of known validity. 
In these cases it is а common practice to refer to estimates or judg- 
ments of individuals who have had an opportunity to evaluate in а 
rather definite way the abilities of the individuals involved in the 
validation study. Frequently validity is determined by the extent 
to which a test calls into play the skills and abilities which experi- 
enced observers consider fundamental to success in the given field. 
The validity of many of the items in the Newkirk-Stoddard Home 
Mechanics T'est is dependent to a large degree upon the agreement 
of certain teachers, supervisors, and other qualified authorities that 
the processes called for are the significant ones. 


The validation of the content of this test was achieved in part 
by the pooled judgment of experienced teachers, home owners, and 
tradesmen. The home owners indicated the projects and content 
which they believed to be important in the maintenance and operation 
of the home. The teachers of home mechanies in 75 schools marked 
the jobs which they considered most important. In developing the 
procedure type of question used in the test, it was necessary to have 
the procedures checked against good trade practice by tradesmen. 
Table 12 gives the ten most frequently occurring home mechanics 


TABLE 12 


Ten HIGH-RANKING Home MECHANICS Joss ACCORDING то 100 Home OWNERS 


Job Frequency 
1. Tosharpen knives 98 
2. To install a pair of hinges 98 
3. To put new screen on a window or door 95 
4. To connect batteries 95 
5. To shape the point of a screw-driver 95 
6. To wash a window 95 
7. To use glue for general repair 94 
8. To regulate a watch or clock 94 
9. To fire a furnace 94 
10. To locate a blown fuse and replace 94 


projects according to the judgment of 100 home owners living in 
small towns in the middle west. Table 13 gives the ten highest-rank- 
ing jobs in home mechanics according to the judgment of 75 teachers 
of the subject. Table 14 gives a procedure rearrangement question 
taken from the Newkirk-Stoddard Home Mechanics Test, the numbers 


VALIDITY 33 


in the parentheses indicating the best trade procedure according to 
the five tradesmen who judged it. 

In many of the achievement tests, validity depends to a large de- 
gree upon the opportunity which the pupil has had to master the 
information covered by the test. The validity of a test may be 
thought of as being general, or it may be considered as being specific. 


TABLE 13 
TEN Нісн-влхкіхв Home Меснахісѕ Joss ACCORDING TO 75 Home MECHANICS 
TEACHERS 
Job Frequency 
1. To make suitable splices, tops, and terminals in electric wires 64 
2. To tin a soldering copper 63 
3. To wire an electric-light socket 63 
4. To mend leaks in kitchen utensils 62 
5. To make an extension cord 62 
6. To wire simple bell circuits 61 
7. To apply stain and filler 61 
8. To apply varnish 60 
9. To cut glass to size 60 
10. To repair leaking compression faucet 57 


TABLE 14 


A REARRANGEMENT QUESTION WITI THE ANSWER AS APPROVED BY 5 TRADESMEN 


To Cut a Piece of Pipe: 
Procedure: (1) Set the cutter on the mark. 
(2) Ream the end. 
(3) Determine the length of pipe. 
(4) Cut the pipe. 
(5) Measure and mark. 
(6) Adjust the pipe cutter. 
(3) (5) (6) (1) (4) (2) 


Many tests are undoubtedly valid in a general sense but are lacking 
in validity in a specific sense. For instance, a survey test designed 
to secure a general bird's-eye-view of achievement in a particular 
subject must be validated in terms of its ability to test for the basic 
items found in the courses of study, not of a single class in which 
certain points of view and certain facts have been emphasized, but of 
the many different schools in which it may be used. For his own 
particular class a teacher may easily construct a test which will have 
much greater specific validity for his purposes and his point of view 


34 SELECTION AND EVALUATION OF TESTS 


than any type of commercial standardized test could possibly have. 
Recently considerable recognition has been given to this phase of 
validity in tests by providing for the classroom teacher source books 
of objective test exercises in a number of subject matter fields.* Such 
material permits the classroom teacher to secure a relatively high 
specifie validity for his tests and quizzes. 


20. Reliability. 


The reliability of a test may be thought of as the consistency with 
which it performs. In a certain sense this matter of consistency of 
performance of a test arises from two factors, the adequacy of the 
sampling represented by the test, and variations in the human re- 
sponse itself which have nothing to do with the content of the test. 
The first of these can be controlled somewhat by selecting the test 
items carefully and extensively from the field which it is supposed 
to measure. The principle of sampling may be illustrated by the prac- 
tice of the large producers of ore. Obviously it would be impossible 
to examine and test every cubie foot of the ore in every carload. 
It is a simple matter, however, for specimens of the ore to be taken 
from different parts of the car and from different cars. These speci- 
mens are carefully mixed together and subjected to the tests which 
determine the quality and price of the ore. This process is called 
sampling. If only one specimen were taken from each car there would 
always be the possibility that the ore at that particular spot might 
have been unusually rieh or poor. "Taking more and more specimens 
increases the likelihood that the resulting sample will be truly repre- 
sentative of the ore in the car. In a similar way, increasing the 
number of samples taken in a testing situation makes it more likely 
that some important phase of the subject may not have been missed 
or given the wrong emphasis, or that some interfering human factor 
may have been operating at the time the samples were taken. 

The accompanying diagram illustrates the effect of sampling on 
the reliability of a test over a limited field of information. Each 
of the small rectangular spaces in Fig. 3 represents an item which а 
student has had an opportunity to learn. The thirty shaded portions 
represent the items which he has actually learned. The ten unshaded 
spaces are items which he has not learned. In this illustration the 
student has a mastery of 75 per cent of the items. Now let it be 
assumed that a test over this material is prepared comprising ten 
items, numbers 1, 2, 3, 6, 7, 8, 10, 12, 16, and 22. If these items are 

1 Kirkpatrick, J. E., and Greene, Н. A., Pupil-Teacher Handbooks of Objec- 


tive Exercises in High School Physics. Bloomington, Illinois: Public School 
Publishing Company, 1930. 


RELIABILITY 35 


selected, this individual will fail on six of the ten, making a percentage 
Score of 40 per cent. However, if ten other items, as numbers 5, 9, 
13, 17, 20, 24, 27, 31, 34, 37 are selected over the entire range of his 
field of information he should be able to answer nine of the ten cor- 
rectly. From this it is clear that it makes a distinct difference where 
the sampling is taken. It may cover material learned but may come 
from too limited a portion of the total field to be truly representative 


of the individual’s accomplishment. In 
this illustration, if the even numbered 
items are chosen, the pupil would prob- 
ably fail on nine of the twenty items. LEE 
If the odd-numbered items are chosen he — ] 
should fail on only one. This results in 222 
а variation in his score from 45 to 95 рег 10 ] 
cent. As the number of items chosen 12 
for inclusion in the test is increased the 14 
pupil’s scores on the test exercises more 16 
nearly approach the actual amount of 18 
his information in the field. It thus be- zo 
comes apparent that the extent of the 45 
sampling is also an important feature 24 
of reliability in a test. zu 

As the reliability of a test is in- „ 
ereased, either through more extensive „o 


or more representive sampling, the "d 


operation of chance variations such as 
temporary disturbances, breaking a pen- и 
cil, and the like, is minimized. Simi- 
larly, increasing the sampling of the test a 
by increasing the number of different 


ГГА 


о с > № 


40 


times the pupil is required to respond to Fia: rm са of 
it tends to limit the effect of physical pling 


disturbances, fatigue, emotional stress, etc. Thus, practically every 
attempt to increase the consistency with which educational tests meas- 


ure abilities and achievements results in the increase in the length of 


the test and testing period. It is becoming increasingly clear that com- 
plex fields cannot be measured reliably by means of brief tests. 


The reliability of a published test should be given in the manual 
of directions accompanying the test. This information is given as a 
coefficient of reliability. The coefficient of reliability is a statistical 
expression of the consistency of performance of the test and how much 
reliance may be placed on scores obtained from its use. Reliability 


36 SELECTION AND EVALUATION OF TESTS 


coefficients are indicated in decimal fractions of 1, usually ranging 
in value from .40 for tests of low reliability to .95 for tests of 
very high reliability. It is improbable that a test will be entirely 
inconsistent or perfectly consistent. However, it is difficult to state 
exactly how low or how high a coefficient of reliability of a test 
should be for the test to be of value. Much depends on the type of 
test and how it is to be used. The data in Table 15 will give some 
appreciation of a suitable coefficient of reliability. 


TABLE 15 


RELIABILITY COEFFICIENTS 


Coefficient of Reliability Rating 
.50- 60 Very low 
60- .70 . Low 
.70- 80 Fair 
.80- 85 Average 
85- 90 Good 
90- .95 High 
:95-1.00 Excellent. 


In general, it is doubtful if а test should be given very serious con- 
sideration if it has a reliability coefficient of less than .80 when 
stepped up by the application of the Spearman-Brown technique from 
correlations on chance-half samplings of the exercises. Some experi- 
ence with long and quite reliable tests in the field of silent reading 
indicates that reliability coefficients which range as high as .95 when 
computed on the odd-even basis drop as low as .90 when results from 
the two equivalent forms of the test are correlated. Reliabilities of 
.90 based on the actual relationships between two equivalent forms of 
the test may be considered very high. Additional discussion of the 
meaning and significance of reliability, as well as a more complete 
explanation of the methods of securing and computing reliability data, 
are given in Chapter XIV. ` 


21. Objectivity. 

Objectivity is an important quality in a test exercise since it con- 
tributes indirectly to validity and reliability. Objectivity is that 
quality in a test exercise which makes for the elimination of the per- 
sonal judgment of the person who scores it. Since this means greater 
accuracy in the grading of such items it naturally indicates greater 
reliability in the measurement of whatever qualities are being meas- 


EASE OF ADMINISTRATION 37 


ured. Objectivity is a function of the form in which the test items are 
Stated. In general, objective-test items are so formulated that only 
one correct response satisfies the conditions of the exercise. Recall, 
true-false, multiple-answer exercises are all illustrations of objective 
forms. These and other common forms of objective exercises are de- 
Scribed and illustrated on pages 107-123 of this book. 


22. Ease of Administration. 

The speed, accuracy, and simplicity with which an educational 
test can be given in the classroom, though not a major criterion for 
tests, is nevertheless one that is worthy of some practical considera- 
ton. There is a very definite tendency in modern test development to 
recognize the teacher's and the supervisor's problems by making the 
tests easy to administer and simple to interpret. 

A significant phase of the administrability of a test is the ex- 
aminer’s manual which accompanies it. The manual should contain 
a clear statement of the qualities measured by the test. It should 
provide concise and simple directions for giving the tests so that they 
may be followed verbatim by the classroom teacher. It should pro- 
vide the critical user of the test with an adequate explanation of the 
methods by which the validity and the reliability data were obtained, 
as well as a concise statement of the meaning of these data in relation 
to the tests themselves. A convenient form of answer key or simple 
stencil for scoring should be supplemented by brief explanatory 
directions concerning the methods of scoring the tests. Simple illus- 
trations of the methods of interpreting the results should be given 
in the manual; and if the field is one in which a follow-up program 
of remedial or corrective instruction is possible, brief suggestions for 
Such work should be made. | 

"The better standardized tests provide the user with carefully formu- 
lated statements on the following types of items, all of which tend 
to protect the pupils and the teacher against the faulty administration 
of the tests and wrong and uncritical interpretations of the results: 


1. Number of parts in the test. — 
2. Directions for each part or division of the test. 
3. Sample or fore exercises to acquaint the pupils with the method 


of work. 
4. Directions for stop 
the page when necessary. — | 
. Definite statements of time limits where required. 


Directions for scoring the tests. | 
Explanations of the norms or standards of accomplishment. 


ping at the end of a test part or for turning 


зр 


38 SELECTION AND EVALUATION OF TESTS 


8. Statement of the total possible scores on each test part and the 
method of securing them. 

9. Explanation and illustration of method of interpretation of 
results in terms of apparent instructional needs. 

10. Suggestions for definite remedial attack on the weaknesses re- 
vealed by the tests. 


23. Norms and Standards. 


Norms and standards, although frequently used as synonymous 
terms, are not exactly identical in meaning. Norms represent actual 
levels of accomplishment for specified groups of individuals. Stand- 
ards are usually considered as representing goals to be attained. One 
is what children or pupils are actually able to do; the other is what 
the teacher should strive to have them do. Practically all present-day 
tests are supplied with norms rather than standards. 

Norms are usually based upon the average or median accomplish- 
ment of large numbers of pupils grouped by ages or by grades. Grade 
norms result from the classification of the pupils by grades. Age 
norms result from the classification by ages. The norms, therefore, 
furnish the teacher with a definite basis for anticipating what given 
groups of pupils may be expected to achieve under ordinary school- 
room conditions. They thus afford the basis for the practical inter- 
pretation and evaluation of the testing program and of the classroom 
instruction under analysis. 

In general, norms, in order to have sufficient reliability for prac- 
tical classroom use, must be based upon rather large and carefully 
sampled populations. There is a growing tendency, however, to base 
standard test norms upon smaller groups of specially selected cases. 
In the past it has been assumed that the inclusion of a large popu- 
lation of unselected eighth-grade pupils would afford the best basis 
for an eighth-grade norm in a specific test field. Evidence brought 
forward by Crawford? indicates that this is not necessarily true. If 
there were a perfect balance in the proportion of pupils in a given 
grade who are retarded and accelerated as to mental ability and school 
progress, such an unselected group might provide suitable and repre- 
sentative norms. The actual evidence shows, however, that in the 
typical school grade the retardation (retarded progress) actually ex- 
ceeds the acceleration by a ratio of four to one and sometimes six 
to one. Accordingly, the norms based on unselected cases are not 
typical of the actual achievement of the normal individual in the 


2 Crawford, J. R., Age and Progress Factors in Test Norms, University of Iowa 
Studies in Education, Vol. 9, No. 4, June, 1934. University of Iowa, Iowa City- 


ECONOMY 39 


group. Serious consideration is being given by modern test workers 
to the more exacting control of these variables. A number of the 
newer tests are reporting norms based upon smaller groups of in- 
dividuals selected for their normality for the group of which they are 
a part. In a number of cases age norms are based upon the results 
of grouping the individuals by means of mental-test scores. This 
procedure alone takes care of the serious diffieulties arising when the 
grouping is by chronological age. To the extent that mental tests are 
standardized accurately, а twelve-year-old (mental age) individual 
is a twelve-year-old wherever he may be found. Every teacher knows, 
however, that a chronological twelve-year-old in the fifth grade is 


quite unlike one in the seventh grade. 


24. Mechanical Features. 

anical features of а test frequently operate definitely to 
affect its ease of administration in the classroom. They are largely 
the result of the editing and printing of the test. 'The paper should 
be of good quality, preferably white bond. The illustrations should 
be clear-cut and easily identified with the content they are supposed 
to illustrate. The page size, the length of line, and the size of type 
used are also mechanical features which may influence the usefulness 


of a test. 


The mech 


25. Economy. 

Other things being equal, economy as à criterion for standard tests 
should undoubtedly be listed last. In the final analysis, any test 
which takes up class time may be counted expensive. Moreover, 
the cheap test is not always the most economical; in fact, quite the 
reverse is apt to be true. Tests costing at the rate of 50 cents per 
hundred and yielding results of limited validity and low reliability 
might readily be much more expensive than tests costing six times 
as much but having validity indices of .80 and reliability coefficients 
of .93. It is not at all unlikely that in the near future educational 
tests will be evaluated in terms of the number of units of valid and 
reliable information yielded per unit of cost. | | 

A modern tendency in test development involves the introduction 
of certain economy features in the booklet, either through the use of 
automatic scoring devices or through design which permits the repeated 
use of the test booklets. Such mechanical features in a test are quite 
acceptable so long as they do not force the test into too badly crowded 
a page, or do not interfere with the validity and reliability of the 


measurement which would otherwise be attained. 


40 SELECTION AND EVALUATION OF TESTS 


26. Number and Equivalence of Forms. 


The more useful educational tests are those which exist in multiple 
forms. 'Тһе forms of a test are secured by preparing two or more 
arrangements of similar but not identical test exercises and assigning 
them to different test booklets. "These multiple forms should normally 
be approximately equal in difficulty, not only in terms of the total 
scores earned by groups of equally able individuals, but also in terms 
of the ratings of items in the different levels of the test. 


IL. EVALUATION OF TESTS 
27. Test-Rating Scales. 


In the foregoing discussions no attempt has been made to evaluate 
the items mentioned in the criteria but merely to explain and discuss 
the types of information which are of value in selecting a published 
test or in judging the value of an objective shop test. However, a 
number of rating scales have been developed in which the different 
factors are weighted and give a test a rating on the basis of 100 
points. These scales are useful to all teachers in selecting tests but 
are especially valuable to the inexperienced teacher until he becomes 
accustomed to judging the different items. The Otis test-rating scale 
is reproduced here. 

Oris TEST-RATING SCALE 3 
Manual (5) 
Validity (15) 
Reliability (10) 
Reputation (5) 
Ease of administration (total 15) 

(a) Preparation (4) 

(b) Time limits (4) 

(с) Explanation needed (3) 

(d) Alternative forms (4) 
Ease of scoring (total 15) 

(a) Objectivity (10) 

(b) Time required (3) 

(c) Simplicity (2) 

Ease of interpretation (total 15) 

(a) Norms (5) 

(b) Directions for interpreting (4) 

(c) Class record (1) 

(d) Application of results (5) 
Convenient packages (5) 
Typography and make-up (5) 
Test service (10) 

Total 100 


3 Otis, A. S., “Scale for Rating Tests,” Test Service Bulletin No. 13. Yonkers: 
World Book Company. 6 pp. 


SELECTED REFERENCES 41 


Ш. SUMMARY 


The teacher should select his tests with care in order to get full 
value for the money and time expended for testing purposes. The 
chief factors to be considered in selecting a test are validity, reli- 
ability, objectivity, norms, multiplicity and equivalence of forms, ease 
of administration, and cost. Validity is the degree to which a measur- 
ing instrument measures the thing it purports to measure. Reliability 
is the consistency of performance of the test itself. Objectivity is de- 
termined by the form of exercise, which in turn controls the number 
of acceptable answers for each question. Norms are the median or 
average performance of pupils of different ages or grades as deter- 
mined by the testing of large numbers of individuals. Standards are 
desirable ultimate goals of attainment. Tests should be economical 
of time and money. The Otis test-rating scale is suggested as a use- 
ful guide for the shop teacher in the selection of commercial tests. 


SUMMARY EXERCISES FOR DISCUSSION 


1. Formulate a concise definition of each of the major criteria for the selection 


of an educational test. 
2. Illustrate at least three types of procedure 
tion of industrial education test items. И 
3. Show by means of a concrete illustration how sampling affects the reliability 


of a test. 

4. What are some of the most е 
objective? 

5. In your judgment, is t 
a matter of any practica. 

6. Secure at least one complete samp! 
use in industrial education classes 
and other supplementary materia 
Otis Score Card for Rating Tests. 
explanations given with the score card, and assign a v 


proportion to its apparent merit. 


which may be used in the valida- 


ffective devices for making test exercises 


he apparent difference between norms and standards 


1 significance? 

le set of standardized tests suitable for 

. Examine the tests, the manuals, keys, 

1 and rate the test in detail, using the 
Consider each item in the light of the 

alue to each point in 


SELECTED REFERENCES 


GREENE, Н. A., and JORGENSEN, A. N., The Use and Interpretation of Elementary 
School Tests. New York: Longmans, Green and Company, 1935. 
Орви, С. W., Educational Measurement in High School. New York: The Cen- 


tury Company, 1930. | 
Ореш, С. W., Traditional Examinations and New-Type Tests. New York: The 


Century Company, 1928. 
Косн, G. M. The Objectwe or New- 
man and Company, 1929. 


Type Examination. Chicago: Scott, Fores- 


42 SELECTION AND EVALUATION OF TESTS 


Rucu, G. M., and Sropparp, Geo. D., Tests and Measurements in High School 
Instruction. Yonkers, New York: World Book Company, 1927. 

Ѕмітн, H. L., and Wricut, W. W., Tests and Measurements. 
Burdett and Company, 1928. 


WILSON, G. M., and Hore, K. J, How To Measure (Revised). New York: The 
Macmillan Company, 1928. 


New York: Silver, 


CHAPTER V 
MEASURABLE FACTORS IN INDUSTRIAL EDUCATION 


28. Factors Related to Mastery in Industrial Education. 

Industrial education teachers, in common with teachers in most 
other subjects, need a great deal of precise information about the 
pupils in their classes if they are to work effectively. Most of the 
data needed can be obtained through measures which are much supe- 


rior to the teacher's unaided judgment. Some of these measuring in- 


struments are still to be developed, and it is certain that many now 
f this chapter to point 


available need refinement. It is the purpose o 
out some of the measurable factors in industrial education. These 


factors present а real challenge to teachers of industrial education. 
Educational guidance and shop instruction both doubtless would be 
markedly improved through the measurement of these factors and the 
wise use of the results in the classroom. These measurable factors 
are enumerated in Table 16, and each will be discussed in this chapter. 


TABLE 16 


MEASURABLE Factors IN INDUSTRIAL EDUCATION 


Measurable Factors Comments 


1. Information Factual information about tools, mate- 
rials, and vocations. (Oak is a cabinet 
wood; the micrometer is an instrument 
used to measure in thousandths of an 
inch; outside paint contains oil.) 

2. Quality Evaluation of the product of manipula- 
tive work in the light of tool, instru- 
ment, or machine operations (drawing, 
hammering, house wiring, bookcase, ce- 
ment lawn pedestal). 

3. Technique Evaluation of skill in manipulating 
tools, instruments, or machines in exe- 
cuting tool operations (method of using 
a plane, a compass, à lathe). 

4. Speed (Rate of response) The time required to accomplish a piece 
of work employing commercial standards 
(time required to make a drawing, а 
table, wire a house). 

43 


44 


MEASURABLE FACTORS IN INDUSTRIAL EDUCATION 


TABLE 16—(Continued) 


MEASURABLE FACTORS IN INDUSTRIAL EDUCATION 


Measurable Factors 


Comments 


10. 


LL, 


12. 


13. 


14. 


15. 


Reading technical symbols 


Reading 


Spelling 


Mathematics 


Appreciation of industrial prod- 
ucts 


Planning 


Language 


Inventiveness 


Personality Traits 


Mechanical aptitude 


Intelligence 


Ability to read working drawings, wiring 
diagrams, architectural drawings, etc. 
Ability to read and comprehend instruc- 
tions or related information from the 
printed page. 

Evaluation of ability to spell common 
words and necessary technical terms. 
Evaluation of mathematies required in 
the various shop courses (woodwork, 
drawing, sheet metal, machine shop, 
home mechanics). 

Evaluation of ability to rank industrial 
produets according to merit (furniture, 
electrical devices, finishes, automobiles, 
houses, radio). 


Evaluation of ability to develop a suit- 
able plan for doing a job (building а 
lawn bench, a dog house, a fence, a 
radio, etc.). 

Ability to use correct English in written 
and oral form. 

Ability to see new relations and develop 
devices and machines for the improve- 
ment of society. 


Rating of traits generally recognized as 
essential to success (industry, coopera” 
tion, consideration for others, self-re- 
liance, aggressiveness). 

Natural aptitude for manipulating me- 
chanical devices and an understanding 
of their operation. 

Ability of an individual to learn as 
measured in terms of the extent a pupil 
has acquired a number of specific and 
largely unrelated abilities. 


These fifteen factors are obviously closely related to mastery in 


industrial education. To the extent that they are basic they should 
represent the framework of measurement in this field. It is probable 
that they can be measured most effectively and the results used to 


FACTORS RELATED TO MASTERY 45 


best advantage when each is tested separately. It should not be un- 
duly difficult to construct devices for the measurement of information, 
tool techniques, quality, and speed in more or less isolated forms as a 
basis for a real analysis of achievement in this subject. It is believed, 
therefore, that these categories are of sufficient importance to warrant 
the elaboration and discussion of each in order of appearance in 
Table 16. ч 

Information. Industrial education courses contain a great deal of 
knowledge about tools, machines, materials, and industries. This is 
especially true in the junior high school where a great deal of emphasis 
is placed on outlook and cultural training and less on the strictly voca- 
tional aspects of the subject. The vocational courses also contain 
much instructional material which is designed to develop knowledge 
about tools and materials rather than skill in the actual modification 
of materials through the use of tools and machines. This information 
or knowledge is similar in all its psychological aspects to knowledge 
in other subjects of the school curriculum and can be measured effec- 
tively with the common type of objective examination. In fact, the 
majority of the tests of achievement which to date have appeared in 
industrial education are measures of information. The same tech- 
niques which have been used for constructing tests in other subjects 
can be applied with only slight modification. However, there is still 
a great deal of work to be done before an adequate supply of measures 
of information will be available for the content of the varied subjects 
of industrial education. Some indication of the extent of this problem 
will be gained from an examination of the accompanying outline of 
informational items on woodworking adapted from the American Vo- 
cational Association committee’s report? on standards of accomplish- 


ment in the industrial arts. 


EXAMPLES OF THE INFORMATIONAL CONTENT OF 
INDUSTRIAL EDUCATION 


"WooDWORKING 


A. Lumber. 
1. Know the principal chara 
uses, and the sources of supply 
ut, birch, maple, mahogany, 


cteristies, the working qualities, the principal 

of the following woods: pine, cypress, 
oak, waln red cedar, hickory, gum, chestnut, 
poplar. 

2. Know the metho 

3. Know how lumber is dried. 

ational Association Committee on Standards of 

Bulletin of American Vocational Association, In- 


1931. 


ds of cutting and milling lumber. 


1 Report of American Voc 
Attainment in Industrial Arts, 
dustrial Arts Section, pp. 28-31, December, 


46 


MEASURABLE FACTORS IN INDUSTRIAL EDUCATION 


Know the effect of moisture on wood. 

Know the standard dimensions of lumber and how classified. 
Know the nominal and the actual dimensions of lumber. 
Know how veneer and plywood are made, and their uses. 


SU gre 


Finishes. 

1. The object of finishes. 

2. The kinds of finishes in common use; such as stain, oil, wax, shellac, 
varnish, lacquer, enamel, paint. 

3. The durability of different finishes. 

4. The conditions or places in which various kinds of finishes may be used 
to advantage. 

5. Materials from which finishes are made. 


Glue. 

1. The kinds of glue. 

2. The preparation of glue. 

3. The conditions and requirements in use. 


Nails, brads, and fasteners. 


1. The kinds of nails. 

2. The uses of the different kinds. 

3. The size of nails. 

4. How nails are sold. 

5. How nails are manufactured. 

6. Sizes of brads and how sold. 

7. Size, kinds, and uses of corrugated fasteners. 
8. Sizes and uses of clamp-nails. 

Screws. 


1. The kinds of screws. 

2. The uses of the different kinds. 

3. How the sizes and kinds of screws are indicated. 
4. How they are sold. 


Sandpaper and steel-wool. 

1. The kinds of sandpaper. 

2. Grades of sandpaper. 

3. Principal uses. 

4. Grades and uses of steel-wool. 


Design of furniture. 


Is it adapted to the use for which it is indicated? 
Is it structurally good? 

Is it well made? 

Are the structural members in good proportion? 
Does it have an appearance of stability? 

Is the structure as а whole well-proportioned? 
Are the outlines pleasing? 

Is it well finished with an appropriate finish? 


Oo cU Gr Ow abe ووو‎ 


FACTORS RELATED TO MASTERY 47 


H. Manufacture of wood products. 
1. The location of important manufacturing concerns. 
2. The division of labor in industry. 
3. The use of automatic machinery. 
I. Joints. 
1. Types of joints, where used, and why. 


J. Hardware. 
1. Types of hinges and their uses. 
2. Types of latches and where used. 
3. Types of locks and where used. 
4, Types of nails and where used. 
5. Special types of fittings. 

K. Abrasives. 


1. Kinds of grinding and sharpening stones, their grades and uses. 


The American Vocational Association committee on standards of 
accomplishment in industrial arts has aptly referred to knowledge of 
the informational type as "the things you should know." The ex- 
ample adapted from the committee's report are not given as neces- 
y the committee or the present authors, but 
ation that can be tested by 
The construction of devices 


sarily complete either b 
they are suggestive of the type of inform 
the eustomary measurement techniques. 
for the measurement of information is diseussed and illustrated in 
Chapter XI. 
Quality. The quality of a project depends on how well the various 
tool operations have been executed, assuming, of course, а constant 
quality of material. It is the composite result of the type of material 
used and the skill with which it was worked. Quality involves such 
factors as squareness, roundness, finish, fasteners, exactness of dimen- 
sions, accurate placement of parts, ete. To secure a reliable measure 
of quality, it is usually necessary to employ a performance test. The 
pupil or pupils being tested make a standard project which gives 
samples of their work with different tools, instruments, or machines. 
The results obtained are then rated on suitable scales of quality, and 
thus the teacher is able to obtain a reasonably objective estimate of 
quality of work. A further discussion of the measurement of quality 
is given in Chapters XI and SLL | 
Quality may be scored by physical measurement, by the use of 
seales, and by general observations of the student's procedure and 
product. The following are examples of measurable qualities from 
several industrial education subjects. The examples reported in Table 
17 are taken from courses of study, committee reports, and the results 
of job analysis. They are not given as complete but should prove 


48 MEASURABLE FACTORS IN INDUSTRIAL EDUCATION 


suggestive for test workers and teachers who may desire to construct 
performance tests. 

Technique. In general, the individual who.manipulates the tools, 
machines, or instruments in the shop or at his desk in the most direct 
and efficient manner has the best technique. A pupil with good tech- 
nique can improve his skill to the maximum of his ability by practice. 
Before technique can be measured it is necessary to know just what 
constitutes the best technique for using different tools and machines 


TABLE 17 
EXAMPLES OF OPERATIONS THAT DETERMINE QUALITY IN INDUSTRIAL EDUCATION 
SUBJECTS 
Subject Operations 
1. Woodwork Planing, joining, nailing, screwing, sawing, measuring, shav- 
ing, scraping, boring, champfering, chiseling, gouging, filing, 


gluing. 

2. Auto mechanics Tighten bolts, install cotter pins, solder, measure in thou- 
sandths of an inch, fit bushing, fit bearings, fit rings, fit 
pistons, time motor, clean, grease. 

3. Drawing Closing corners, lettering, numbering, placement of draw- 
ing on page, measuring, neatness, dimensioning, circles, el- 
lipse, irregular figures. 

4, Electricity Skinning wire, soldering, splicing, insulating, cutting wire, 
bending and straightening wire, installing conduit, install- 
ing binding posts, switch installation, drilling metal, con- 
crete and brick, boring wood. 

5. Home mechanics Soldering, cutting wire, splicing, insulating, skinning wire, 
attaching wire to binding posts; drilling metal, brick, and 
concrete; planing, boring wood; chiseling, nailing, screw- 
ing, sawing, cutting pipe, tightening pipe joints, tightening 


belts, splicing belts, filing; applying varnish, enamel, and 
paint. 


for different purposes. To measure technique, therefore, it is neces- 
sary to emphasize the best shop procedures for the tools used in the 
course and then check the technique of the pupil being tested through 
observation. Tests of this type are especially helpful in diagnosing 
pupil difficulties in the manipulative phases of the course of instruc- 
tion. Tests of technique should prove of special value in certain types 
of trade- and continuation-school classes. 

Speed. Speed is determined by the time it takes to do a job with 
the quality held at a standard. Unless the quality is held constant, 


FACTORS RELATED TO MASTERY 49 


the time required to do a piece of work is not а suitable basis for deter- 
mining speed. Speed has considerable importance in trade courses, 
but in eultural courses of the junior-high-school type it is of much 
less significance. If a student has good technique he can develop his 
maximum speed through practice. At ihe present time the authors 
do not know of any suitable tests of speed which are available for 
general use in industrial education. 

Reading Technical Symbols. It is necessary to read drawings and 
interpret various types of symbols in industrial education. In life and 
in school it is often of more practical use to be able to read symbols 
than to make them. In a trade course where the objective is strictly 
vocational, both the making and reading of symbols may be of im- 
portance. In order to get an adequate measure of a pupil's ability 
to read drawings or symbols, it is necessary to construct objective 
tests which give the pupil an opportunity to read enough drawings 
to determine his ability in this respect. No adequate tests of this 
type have appeared thus far for general use, but sufficient work has 
been done to demonstrate their feasibility. 

Reading. Teachers of industrial education should quickly discover 
the reading abilities and limitations of their students. Those who are 
defective in reading should be given remedial help designed to over- 
come the difficulty, rather than allow them to be penalized by a poor 
mark because they are unable to understand the printed instructions. 
Since written instruction sheets are coming into common use in in- 
dustrial education subjects, reading is especially important. A num- 
ber of excellent tests of reading ability are available. 

Spelling. More written work is found in certain phases of shop 
work than formerly, and the shop teacher should assume his share of 
the responsibility for the elimination of spelling difficulties. Certain 
technical terms should be mastered by the pupil so that he has no diffi- 
culty in pronouncing OT spelling them correetly. Such training should 
be included as part of the regular course. Considerable research has 
been done in spelling, and many good spelling tests are available for 
determining spelling difficulties. Suitable spelling tests can easily be 
constructed for the purpose of measuring the pupils’ mastery of the 
technical words used in the respective courses. Industrial education 
teachers will do well to make the pupil's spelling ability entirely dis- 
tinct from achievement in the course. 

Mathematics. Many shop courses involye related mathematics, 
The student usually has some mathematical background when he 
comes into the shop, but often many details have been forgotten. 
Accordingly, there are ordinarily wide differences in the abilities of 


50 MEASURABLE FACTORS IN INDUSTRIAL EDUCATION 


the pupils in the class. Some may be able to do the related mathe- 
matics whereas others may need remedial instruction. A few tests 
in shop mathematics have appeared, but they have been more general 
than specific. There is still a need for tests which treat the mathe- 
matics needed for special industrial education subjects. Some indus- 
trial subjects in which related diagnostic tests in mathematics may 
be used effectively are printing, electricity, woodwork, auto mechanics, 
sheet metal, and machine shop. 

Appreciation of Industrial Products. One of the major objectives 
of junior-high-school industrial arts is the development of the con- 
sumer’s appreciation of industrial products. The majority of people 
are more frequently buyers of industrial commodities than they are 
the actual producers. This means that an appreciation of the prod- 
ucts of industry and the trades is important and should be measured. 
Little usable material is now available for the measurement of this 
factor. It, therefore, presents an unusual challenge to test workers in 
industrial education. Psychologically, the selection of an article in- 
volves the making of a judgment based on the composite of several 
variable factors. For example, if it is desired to select a kitchen chair, 
the following factors might come into consideration: material, cost, 
utility, designs, weight, strength, and finish. It is obvious that a con- 
sumer must have a knowledge of the qualities of the commodity, and 
some experience in evaluating them, before an adequate selection can 
be made. At the present time a rating scale seems the most satisfac- 
tory means of developing and measuring consumers’ appreciation. 
Specific suggestions on the construction of such rating scales are given 
in Chapter XII. 

Planning. Planning involves the ability to map out a direct and 
effective method for doing a job. It is generally conceded that, before 
a workman can plan the best method for doing a job, he must have 
an understanding of the factors involved. Ability to plan is now gen- 
erally given as one of the desirable objectives to be achieved in in- 
dustrial education courses. Individual differences are as great in 
ability to plan a project as in ability in other directions. Some of 
this difference in ability to plan is due to lack of training and some 
to native capacity. The habit of attacking problems in an orderly 
manner is a valuable one in any type of occupation, and industrial 
education work offers many opportunities for its exercise. Here again 
no suitable tests of planning have appeared, although they would be 
very useful in measuring success in shop courses. Some evidence of & 
student’s ability to plan, as it relates to a single subject, may be 
secured by giving a number of situations which require the formula- 


FACTORS RELATED TO MASTERY 51 


tion of a plan. The plans proposed by the students may then be com- 
pared with an ideal solution and with proposals of other pupils having 
similar backgrounds. 
Language. The industrial education teacher needs to give atten- 
tion to the language difficulties of his students in order to be of greater 
. service to them in developing desirable oral and written language 
habits. The language of industrial education courses involves essen- 
tially the same principles as those governing correct usage in any 
subject. This enables the industrial education teacher to make use 
of available diagnostic tests for determining language difficulties. 
Here, as in spelling, teachers of industrial education should distinguish 
between achievement in industrial education courses and development 
in the use of language. A pupil’s achievement in an industrial educa- 
tion subject should not be penalized because of language errors. 
Achievement in a course in woodworking is one thing, and the mis- 
takes a student makes in language are quite another. The wise and 


sympathetic teacher will give the pupil special help designed to over- 


come his language difficulties, rather than penalize him by lowering 


his mark in industrial education achievement. 

Inventiveness. Psychologically, inventiveness is similar to plan- 
ning, in that it involves the formulation of a plan or series of plans. 
It obviously requires a higher type of mental ability, even to the ex- 
tent of demanding abstract thinking with a dash of constructive imagi- 
nation thrown in. Every shop teacher has met the boy who believes 
he has the solution of perpetual motion, or who knows he can improve 


available shop equipment by making certain changes in the machines. 
The authors have had several experiences 1n which boys sanding wood 
by hand near the wood lathes have conceived the idea of making a 


cylindrical sander by using the lathe. Probably not one of these boys 
had ever seen a power sander, and would have been greatly surprised 
as well as disappointed to learn that their invention had been used 
for several generations. So far, no adequate test of inventiveness has 
been developed. а 2 

Personality Traits. In recent years much consideration has been 
given to the significance of personality or character traits. The 
American Vocational Association committee on standards of accom- 
plishment in industrial arts? refers to this phase of the work as “what 
you should be,” and suggests the following traits as being worthy of 
development because of their recognition as essential to success in life: 
industry, cooperation, consideration of others, self-reliance, and readi- 
Industrial Arts Teaching, Bulletin of the Ameri- 


? Sta rds of Attainment in 
ee loro York, December 12, 1931, pp. 21. 


х pm ү 
can Vocational Association, New 


52 MEASURABLE FACTORS IN INDUSTRIAL EDUCATION 


ness to assume responsibility. Character development is certainly an 
important phase of education for a democracy. Industrial education 
teachers have many opportunities to develop these traits in their 
students. Ordinarily, personality or character traits are measured by 
means of a rating scale. Several such scales have been developed for 
general use, but no published tests have so far appeared for rating 
pupils in the shop. 

Mechanical Aptitude. Mechanical aptitude may be thought of as 
the capacity of a pupil to deal successfully with mechanical devices. 
Mechanical aptitude is now generally recognized as a measurable qual- 
ity. It varies considerably among individuals and, in general, has а 
low correlation with intelligence scores. A knowledge of a pupil's 
mechanieal ability is of value in assigning projects and in giving 
guidance suggestions for industrial vocations. Considerable research 
has been done on mechanical aptitude, and several good tests are 
available. 


Intelligence. General intelligence is commonly considered the abil- — 


ity of an individual to learn. A knowledge of a pupil’s general in- 
telligence is of considerable importance in teaching and in educational 
guidance. To date, more than two hundred tests of intelligence have 
been published. The majority of these have received little considera- 


tion because of the lack of adequate validation, or the unreliable 
measures resulting from their use. 


SUMMARY 


Industrial education teachers need precise knowledge about their 
pupils as an aid in teaching, rating, and guidance. Fifteen important 
and measurable factors in industrial education are: information, qual- 
ity, technique, speed, reading technical symbols, reading, spelling, 
mathematics, appreciation of industrial products, planning, language, 
inventiveness, personality traits, mechanical aptitude, and intelli- 
gence. Many of these measurable factors are a distinct challenge to 
test workers and teachers in industrial education. In general, the 
measurable factors of industrial education can be tested most effec- 
и when they аге measured individually or in a separate division 
of a test. 


SUMMARY EXERCISES FOR DISCUSSION 


1. Define and illustrate each of the measurable factors listed in Table 16. 


2. Outline а plan for measuring objectively а pupil's ability to plan an attack 
on an industrial education problem. 


3. State three or more objective questions or exercises designed to measure in- 
ventiveness, 


SELECTED REFERENCES 53 


4. Show how reading ability is an important factor in the measurement of 


achievement in industrial education. 
5. Make a list of the technical words that pupils in industrial education course 


should be able to spell. 


SELECTED REFERENCES 


Leavirr, F. M., “Standardized Measurements in the Field of Industrial Arts,” 
Industrial Arts Magazine, Vol. 8: 132, April, 1919. 

Newxinx, L. V. and Sropparp, Geo. D., The General Shop. Peoria, Illinois: 
The Manual Arts Press, 1929. 

Report of Committee on Standards of Attainment in Industrial Arts, Bulletin of 
the American Vocational Association, Industrial Arts Section, December, 
1931. pp. 20-31. 

Sworr, Ammon, “How to Construct Objective Tests in Industrial Subjects,” 
Industrial Education Magazine, Vol. 30: 7-9, No. 1, July, 1928. 


CHAPTER VI 
ADMINISTERING INDUSTRIAL EDUCATION TESTS 


29. Responsibility for Giving and Scoring the Tests. 

The matter of determining the responsibility for giving and scoring 
educational tests rests chiefly upon the function the tests are expected 
to perform. If the tests are of the narrow-function type, closely paral- 
leling the course of study taught by the teacher, they should undoubt- 
edly be given by the teacher himself. If they are designed for survey 
purposes, or if the results are to be used for experimental, supervisory, 
or research purposes, they should probably be given by some one rep- 
resenting the administrative office of the school. Since these latter 
uses of the modern educational test are by far in the minority in most 
school systems, it is obvious that most of the classroom testing will 
be done by the classroom teacher. 

An excellent generalization for determining the responsibility for 
the administration of educational tests may be stated as follows: 
Whenever the test results are of a type to provide the teacher with a 
reliable and valid basis for the discovery of individual pupil diffi- 
culües in learning or achievement, they should so far as possible be 
administered and interpreted by the teacher himself; otherwise, they 
should be administered by some other school official or a disinterested 
party. The single important exception to this general policy is the 
individual mental test, which, of course, should be given by a trained 
and experienced examiner other than the classroom teacher. 

Properly selected tests for classroom use will contain so much val- 
uable information that the teacher, in most cases, will be robbed of a 
rieh opportunity to learn about his pupils and his own instructional 
efficiency if he does not insist upon his right to score the papers him- 
self. This is particularly true of industrial arts subjects, in which the 
instruetor will be mainly interested in the pupil's mastery of content. 
Teachers should regard the scoring of educational tests, whether they 
be those selected and used by themselves with their own classes or 
superimposed tests, as a personal responsibility and as an opportu- 
nity for securing significant information which should distinctly im- 


prove their teaching practice. 
54 


| 


WHEN TO GIVE TESTS 55 


30. When to Give Tests. 

The matter of when to use an educational test in the classroom, 
like the location of the responsibility for giving it, is determined almost 
entirely by the function the test is to perform. In the period of test 
development when the tests were not so numerous and lacked suffi- 
cient reliability for individual pupil analysis, the common practice was 
to administer a test at the end of the school term. This was adequate 
to give a general picture of the end-product of instruction, but it 
failed to accomplish two very important things from an educational 
point of view. In the first place, assuming that one of the very im- 
portant functions of school training is to bring about changes in the 
quality of pupil response or the level of mastery, such a procedure 
gives no basis for such an evaluation. Progress cannot be determined 
by end-of-the-year measurement alone. In the second place, any 
weaknesses revealed by this end-of-the-year measurement are brought 
out too late to permit anything to be done about them. Remedial 
and corrective instruction under these conditions of measurement is 
impossible. Accordingly, many teachers are now making use of simi- 
lar forms of tests at the beginning of the year as a check on initial 
status, and again at the end of the year as a measure of the year's 
accomplishment. This type of measurement permits an evaluation of 
initial status and the relative efficiency of instruction during the year, 
as well as presenting à fairly accurate picture of pupil growth in mas- 
tery during the period. 

The development of extensive and detailed tests based upon a much 
more critical analysis of the different fields of instruction gave rise to 
a more refined idea of the use of tests. Clearly, the use of the test 
at the beginning of the term was justified mainly by the fact that - 
it made it possible for educational progress or improvement to be eval- 
uated. The end-of-the-year test was justified largely on the same 
basis, for it certainly did not provide any adequate basis for corrective 
instruction since the pupils involved were likely to be out of the hands 
of the teachers by the time the tests themselves were corrected and 
interpreted. The next logical step in the use of tests, therefore, was 
to construct many tests measuring a rather limited unit of instruc- 
tional material. These tests made it possible for the teacher to make 
an immediate check-up on the efficiency of instruction as soon as the 
teaching of a particular unit of subject-matter was completed. The 
fact that these tests were each standardized as of the end of the par- 
ticular instructional period involved made them especially useful to 
the teacher as the basis for organizing corrective and preventive work. 

The narrow-function unit-type tests have been slow to develop in 


56 ADMINISTERING INDUSTRIAL EDUCATION TESTS 


the industrial education field although several useful contributions 
have been made. For the most part these tests are not standardized, 
nor has their reliability of measurement been critically checked. How- 
ever, they present useful material and should point the way to addi- 
tional contributions in this field. The validity of the tests has been 
determined with much more care than has the reliability. It is prob- 
ably safe to assume, however, that the reliability, low as it may be in 
certain cases, is much better than the teacher’s subjective judgment. 

The Nash-Van Duzee Instructional Review Test in Mechanical 
Drawing? is a good example of a series of short tests based on a care- 
fully validated group of instructional divisions, 

These tests include the following subject-matter units which were 
selected after an analysis of textbooks, courses of study, and drawing 
teachers’ judgments: drawing instruments and their use, terms and 
definitions, lettering, orthographic drawings, working drawing, sec- 
tions, graphs, inking technique, construction problems, developments, 
materials, screw threads, conventions, fastenings, pictorial drawing, 
architectural drawing, gears, cams, detailed drawing, and assembly 
drawing. 

The Hunter Shop Tests 2 
instructional units in differe: 
ever, the validity has not 


representative courses of study and cannot be used for the detailed 


analysis that is possible with the Nash-Van Duzee Instructional Re- 
view Tests in Mechanical Drawing. 


type tests must be 
the year, Naturally, this type of test- 


in terms of time, pupil-teacher effort, 
› there are seri 


1 Nash, Harry B. 
Mechanical Drawing, 
? Hunter, Wm, H, 


and Van Duzee, R. В., 
Bruce Publishing Company, 
Hunter Shop Tests, Manu: 


Instructional Review Tests in 
» Milwaukee, Wisconsin, 1930. 
al Arts Press, Peoria, Illinois. 


CONTROLLING THE VARIABLES IN TESTING 57 


The teacher has a right to expect a tangible return in the form of 
supervisory suggestions and remedial helps for his class for any time 
spent in taking, giving, scoring, or interpreting educational tests. 


31. Controlling the Variables in Testing. 


Most modern tests on the informational aspects of industrial edu- 
cation are constructed in such ways that almost any shop teacher who 
is reasonably skillful in maintaining the discipline of his class and who 
will follow the directions accompanying the tests can administer them 
without difficulty. It is always desirable, of course, that the teacher 
should become very familiar with the examiner’s manual before at- 
tempting to give any kind of test. If the examiner is inexperienced 
in the giving of tests he should try the test out on someone before 
giving it to his class. If this is impossible, the test itself and the direc- 
tions for administering it should be read through several times before 
attempting to give it to a class. 

The following general suggestions may be useful to the individual 
not widely experienced in the administration of tests: 


1. Before beginning the tests have the desks cleared and see that 
each pupil is provided with one or more pencils. Have a num- 
ber of extra pencils available for emergencies. 

2. The room should be quiet throughout the tests. Require strict 
attention to the directions, and see that the pupils follow your 
commands at once. If the group tested is large, additional 
proctors may be necessary. They should move quietly about 
the room and see that all pupils get started correctly and 
together. 

3. The examiner (and proctors) should pass down the aisles and 
place a test booklet on the desk of each pupil with the cover 
page (page 1) facing the pupil. If the tests are in mimeo- 
graphed form, place the folders face down and instruct pupils 
to leave them in that position until they are told to fill in the 
blanks at the top of the first page. 

4. All directions to the pupils should be given carefully in a tone 
which carries proper emphasis and suggests authority. The 
voice should be just loud enough to be heard in all parts of the 
room used for testing. 

5. Follow the directions of each test strictly, and adhere rigidly 
to the time limits. A stopwatch is highly desirable for timing 
the tests. 


58 ADMINISTERING INDUSTRIAL EDUCATION TESTS 


6. See that all pupils start and stop instantly upon the signal. 
Students should be instructed that, should they finish a test 
before time is called, they may go over their work and look 
for mistakes. 


32. Administering Manipulative Tests. 


Manipulative or performance tests present problems of administra- 
tion which differ in certain respects from the objective pencil-and- 
paper tests of information. In a performance test the pupil modifies 
materials with tools, makes a drawing with instruments, or applies a 
coating to a surface. The performance test is a recognition test in 
which the pupil selects the tools or instruments which are already 
available and turns out a product which can be rated objectively and 
compared with other, similar products. 

Like tests of the paper-and-pencil variety, manipulative tests usu- 
ally have prepared sets of directions which must be studied by the 
teacher and carefully followed. It is also advisable for the teacher 
to practice giving a performance test to individual pupils or to a small 
group before attempting to give it to an entire class. Manipulative 
tests require very careful supervision in order that the resulting prod- 
uct may be uniform and thus lend itself to objective rating. 

The authors have found the following points helpful in administer- 
ing manipulative tests: 


1. Read aloud and distinctly the directions to pupils while the 
class follows silently. 

2. Answer all questions about the directions before the test is 
begun. 

3. Show the pupils a completed test-product, and, if they care to, 
let them examine it. 

4. If there are no further questions, say, "Get ready. First tool 
or instrument up. Go." (Record time.) 

5. During the examination answer any questions about the steps 
in the procedure by reading the steps in question with the 
pupil. 

6. Observe the pupils as they work to make certain they are all 
doing the steps in the correct order. 

7. Make certain that the proper tool or instrument is used where 
indicated, but do not tell the pupils how to use the tool. 

8. Help any pupil having trouble in reading working drawings, but 
do not make measurements on his problem for him. 


SCORING MANIPULATIVE TESTS 59 


Other factors which the authors have found important to consider 
in administering a manipulative test are: (1) the condition and place- 
ment of tools or instruments, (2) quality of materials, (3) lighting, 
and (4) convenience and comfort of place to work. The tools used 
must be of suitable size, properly adjusted, and uniformly sharp for 
each student taking the test. The materials to be used should be of 
uniform quality and free from defects. If one pupil has a piece of 
knotty oak and another one а piece of clear gum-wood, it is obvious 
that the two pupils are not working under comparable conditions and 
have not the same chance to obtain good results. It is conceivable 
that the pupil with the knotty oak might do better work and yet get 
a lower rating. Proper lighting is also very important in а perform- 
ance test, because it 18 necessary for the pupils to get clear images of 
their work. A suitable light for a manipulative test is 12 to 15 foot- 
candles of illumination on the bench top or drafting-board surface. It 
is also well to be certain that all pupils taking a manipulative test 
have normal eyesight. The benches or drafting boards should be of 
proper height for the individuals taking the test. It too frequently 
happens that extremely tall or short pupils are handieapped by too 


low or too high a working surface. 


33. Scoring Manipulative Tests. 

Manipulative tests are scored by physieal measurements, rating 
scales, and observation of experts. It is obvious from this statement 
that the products of manipulative tests cannot be scored as objectively 
as the peneil-and-paper tests. However, the scoring is much more ob- 
jective than the teacher’s subjective judgment, and usually reasonable 
objectivity can be secured. The most objective scores are obtained 
where physical measurements can be used. If a pupil cuts a board 
131% inches long when it should have been 14 inches, the board can 
be measured and the amount of the error determined in fractions of an 
inch. This is as objective as the response to a true-false question. 
The rating of a soldered joint, a rope splice, a glued joint, the setting 
of a flat-head screw, or the tying of an underwriter’s knot are modifi- 
cations of materials, but they are the result of an almost limitless 
number of small variables which produce a quality of workmanship 
that is not easily measured by instruments or readily judged by ex- 
perts without means of comparsion. Since this is true, joints, splices, 
lettering, etc., can be rated most objectively by comparing them with 
samples of known quality. This rating process is referred to as using 
а rating scale. The construction and use of rating scales are discussed 


in detail in Chapter XII. 


60 ADMINISTERING INDUSTRIAL EDUCATION TESTS 


There are other characteristics of the results of manipulative tests 
which do not lend themselves to physical measurement, but which can 
be rated quite objectively by experienced observers. Examples of 
these are letters transposed in printing, or loose wires in electrical cir- 
cuits. The important thing in achieving objectivity in scoring manipu- 
lative tests is to use the techniques of physical measurement, quality 
Scales, and experienced observers in rating those results to which they 
аге adapted. If this is done the objectivity of the scoring of manipula- 
tive tests may be made quite satisfactory. 


34, Responsibility for Training in the Use of Scales. 

The use of scales for the more or less objective rating of qualities 
or products has been pointed out as an important phase of measure- 
ment in industrial education. In fact, many outcomes of shop work 
are measurable in no other way than by rating scales. For example, 
the quality of a soldered joint in metal work or of a mitered joint in 
woodwork is not readily evaluated objectively except through the use 
of a scale. Experience with such rating scales makes it apparent, 
however, that reliable results cannot be obtained from their use by un- 
trained and inexperienced judges. Brief courses of training in the use 
of the scales result in distinctly reducing the unreliability of measure- 
ment resulting from the subjective factors. Classroom teachers can be 
trained in the use of handwriting scales, freehand drawing scales, com- 
position scales, and doubtless many other kinds of scales, to the point 
where the average error or deviation from a known quality rating will 
not exceed five points on a hundred-unit scale. This is probably not a 
serious inaccuracy in such measurement. 16 may be inferred therefore 
that similar training periods must be provided for industrial education 
teachers who are desirous of using rating scales in this field, Increased 
reliability of measurement may be expected as a result of such 
training. 

The head of the department of industrial education in a large 
high school may well assume the responsibility for giving his teachers 
a brief course of training in the use of rating scales. Typical samples 
taken from the chosen field may be used for this practice. Preferably, 
samples representing a wide range of quality should be chosen. If 
the samples are selected from the products of a class and the true 
quality scores of the samples are not known, the average ratings given 
by a group of six or seven teachers may be taken as the basis for 
adjustment. Judges whose ratings deviate most widely from the com- 
posite or the true values should be asked to rerate their samples, 
making certain conscious adjustments in their mental standards of 


SUMMARY 61 


quality until they conform quite closely to the standards of the group. 
Considerable experience in working with training-groups in the use of 
such seales indicates that certain individuals readily adjust their ideas 
or standards of quality to those of the scale. There is some slight evi- 
dence that such individuals, being gifted with greater discriminative 
power, are usually found among the more able groups of teachers. 
For such individuals a brief period of training is adequate. For the 
average teacher, inexperienced in the use of such scales, as many as 
two or three hourly periods of drill in the rating of the selected speci- 
mens may be necessary before a satisfactory level of accuracy of 


measurement is reached. 


35. Scoring Pencil-and-Paper Tests. 

One of the important distinguishing features which character- 
izes objective paper-and-pencil tests is their very objectivity. Ob- 
jectivity in a test implies little or no variability in the acceptable 
answers. Objective tests should be scored in exact accordance with 
the scoring key. The directions should be followed rigorously and 
exactly according to instructions, even though they 
to the user’s best judgment. Unless this care in 
scoring the tests is taken, it is impossible and improper to make com- 
parisons of the test results with the norms or standards which have 
been derived under controlled conditions. Errors in scoring and tran- 
scribing test scores are best eliminated by rechecking all such work 
and by performing all related calculations at least twice. Special 
care should be taken where the results are to be used for experimental 
purposes or for individual pupil analysis. 

The remaining phases of the administration of tests in the class- 
room are essentially statistical and interpretational in character and 
as such are reserved for discussion in Chapters XIV and XV. 


the tests scored 
may run counter 


SUMMARY 


determining the responsibility for the giving and 
ts rests to a large degree upon the use to be 
The questions of when to use an educational 
o use are answered almost entirely by 


The matter of 
scoring of educational tes 
made of the test results. 
test and what kind of a test t 


the function the test is to perform. 
Tests in the industrial education field are broadly divided into (1) 


tests of information and (2) tests of performance. Information tests 
are commonly of the paper-and-pencil variety, calling for evidence of 
a mental reaction. Performance tests are manipulative and construc- 
tive in character, calling upon the student to apply tools and skills 


62 ADMINISTERING INDUSTRIAL EDUCATION TESTS 


to materials, and produce tangible objects of varying quality in ac- 
cordance with certain definite specifications. Conditions under which 
tests of both types are administered must be carefully controlled if the 
results are to be meaningful. 


SUMMARY EXERCISES FOR DISCUSSION 


1. What should be the classroom teacher's responsibility for the administration 
and interpretation of industrial education tests? 
What factors determine primarily what tests to give and when to give them? 
3. In what specific points does the administration of the objective test of the 
paper-and-pencil type differ from that of the manipulative type of industrial 
arts test? 
4. How may the scoring of manipulative tests be objectificd? 
- What does the evidence show regarding the influence of rating scales for shop 
products on the reliability of marks assigned? Does training in the use of 
such scales appear to pay? 


w 


e 


SELECTED REFERENCES 


GREENE, Н. A., and JORGENSEN, A. N., The Use and Interpretation of Elementary 
School Tests. New York: Longmans, Green and Company, 1935. 

Grecory, C. А., Fundamentals of Educational Measurements, Boston: Houghton 
Mifflin Company, 1923. 

МсСли,, W. A, How to Measure in Education. New York: The Macmillan 
Company, 1922. 

Overt, C. W., Educational Measurement in High School. New York: The 
Century Company, 1930. = 

Косн, G. M., and Sroppanp, Geo. D., Tests and Measurements in High School 
Instruction. Yonkers, New York: World Book Company, 1927. 

Surg, Н. L. and Wricut, W. W., Tests and Measurements. New York: Silver, 
Burdett and Company, 1928. 

TRABUE, M. R., Measuring Results in Education. New York: American Book 
Company, 1924. 

Wirsox, G. М., and Hore, К. J, Пош to Measure (Revised). New York: The 
Macmillan Company, 1928. 

Woopy, CLIFFORD, and Sancren, PAUL V., Administration of the Testing Program. 
Yonkers, New York: World Book Company, 1933. 


CHAPTER VII 
INDUSTRIAL EDUCATION ACHIEVEMENT TESTS 


Selected tests for certain fields of industrial education are described 
and evaluated in this chapter. In general, the tests named have been 
published and widely distributed during the past few years. Many 
are quite satisfactory and are in most respects the equal of good tests 
in other fields, but quite a number are pioneer efforts and are in- 
cluded more because they are suggestive for future development along 
more scientific lines than for their present merit. Many more care- 
fully prepared standardized tests in industrial education are needed 
before the field will be as well covered as other fields of instruction. 
No attempt has been made to include all the available tests, but tests 
which seem to have special values for industrial education teachers 


have been selected from the different instructional fields. 


36. Achievement Tests in Industrial Education. 

In order to measure achievement in industrial education, it is neces- 
sary to measure information and ability to perform tasks involving 
the use of tools, machines, and materials. Ability to perform a task 
does correlate with knowledge, but the relationship does not seem to 
be sufficiently close to warrant the use of the pencil-and-paper-type 
test to measure all types of achievement in industrial education. For 
example, a pupil may know how to do a job and be able to do it if 
given the opportunity, and yet make a poor score on а pencil-and- 
paper test because he does not know the technical vocabulary. An- 
other pupil may know the procedure and the vocabulary but lack the 


tool skill necessary for the execution of the project. 
The fact that the work in industrial education is not well stand- 


ardized from school to school has been pointed out by many test 
workers as an insurmountable obstacle in the way of the construction 
of standardized industrial education tests, and it has been contended 
that further test construction should wait until the work is more defi- 
nitely standardized. On the surface this seems logical, but in fact it 
has little foundation because, in other fields of instruction, the research 
work needed to validate а test has been one of the chief influences 


tending to standardize curricular content, and establish levels of ac- 
63 


64 INDUSTRIAL EDUCATION ACHIEVEMENT TESTS 


complishment. In industrial education, as in other subjects, it is 
necessary to make a careful study of teaching practice, textbooks, 
courses of study, committee reports, and, in many cases, to make ex- 
tensive analysis of the subject to be tested. All these studies lead 
toward a better understanding of the content in any of the subjects 
selected for test construction. These validation studies have a marked 
influence on teaching practice because they are put in the form of a 
test and the teacher can determine in part whether or not his course 
is valid by comparing it with the items in the test and the median 
results obtained with those of other schools. It is obvious, therefore, 
that standardized tests of achievement with their attending validation 
studies are one of the strongest influences tending to define and set 
up standards of accomplishment in industrial education courses. It is 
of course difficult to establish norms that are of great value, but norms 
can be revised as the work becomes more uniform through the use of 
validation studies. 


37. Standardized Industrial Arts Tests. 


In this section four widely used standardized industrial arts tests 
are briefly described. 


1. Nasu-Van Duzee Woopwonk Теѕт 1, SCALE А! 


This is a test designed to measure the junior- and senior-high- 
school pupil’s knowledge of processes, tools, materials, and information 
used in woodworking. Five different types of questions are used in 
the test with appropriate directions for each type. The test is printed 
in a neat, eleven-page booklet and is accompanied by a manual of 
directions, objective scoring key, and class record card. In general 
the test has been carefully constructed. 

Validity. The validity of the test was based, on an analysis of 
teaching content as obtained from courses of study, textbooks, surveys, 
reference books, and trade analyses. Apparently the validity of the 
test is satisfactory in the light of common practice. 

Reliability. The coefficient of reliability based on 200 cases was 
found to be .86, using the chance-half method and employing the 
Spearman-Brown formula for estimation of the reliability of the whole 
test. The reliability of the separate divisions of the test on the same 
number of cases is given in Table 18. Although the reliability of this 
test is a little below the best academic tests of the same type, it is 
very satisfactory for measuring achievement for comparative purposes. 


1 Nash, Harry B., and Van Duzee, Roy R., Woodwork Test 1, Scale A, Bruce 
Publishing Company, Milwaukee, 1927. 


STANDARDIZED INDUSTRIAL ARTS TESTS 65 


Norms. Norms are reported on 3000 cases. They are given on the 
basis of semesters and number of minutes of instruction from the first 
semester of the junior high school through the first two semesters of 


TABLE 18 


RELIABILITY COEFFICIENTS FOR Nasu-Vax Duzee Woopwonk Test 


Part Reliability Coefficient 
I A 61 
I B 80 
I Total 85 
II 80 
ш 94 
IV 88 


also reported on training-school stu- 
given in the form of percentiles with 

The methods employed in securing 
be very suggestive to other test 


high school. A few cases are 
dents. The norms are likewise 
à corresponding marking scale. 
and reporting these norms should 
workers in industrial education. 


2. NasuH-VAN DUZEE Woopwonk Test 1, Scare В? 


This is a test for the purpose of measuring the pupil’s skill in ma- 
oodworking tools. It is a companion test of Test 1, 
Scale A, which is a test of information rather than performance. The 
test is suitable for measuring manipulative achievement in junior- and 
senior-high-school hand woodwork. Nash and Van Duzee ° state in 
the manual that “the test aims to measure the pupil’s understanding 
of directions involving frequently used woodworking processes and 
procedures, the reading of a working drawing, the selection of proper 
tools to carry out the specified work and the ability to use the tools 
selected to do the required work.” 

Validity. The skills for the test were selected after analyzing 
courses of study, textbooks, problem books, and blueprints used in 
junior and senior high schools. The items selected are representative 
of general practice, but there is a question as to ы enough of 
each type of item is given to really sample the pupil's ability. 


nipulating hand w 


2 Nash, Harry В., and Van Duzee, Roy R., Woodwork Test 1, Scale B, Bruce 
+ " 


ishi ilwauk vi sin, 1928. 
Publishing Company, Milwaukee, Wisconsin, К 
3 Nash, Pen в. and Van Duzee, Roy R., Manual of Directions Industrial 


Arts Test 1, Scale B, Bruce Publishing Company, Milwaukee, Wisconsin. 


66 INDUSTRIAL EDUCATION ACHIEVEMENT TESTS 


Reliability. The reliability of this test is reported as varying from 
.60 to .80 with an average of about .73. This is higher than a 
teacher's subjective judgment but is probably too low for a first-class 
standardized test. 

Norms. Median and percentile norms based on a few hundred 
cases are available. 


3. NEWKIRK-STODDARD Номе MECHANICS TEST * 


The chief purpose of the Newkirk-Stoddard Home Mechanics Test 
is to measure in an objective and analytical manner the essential 
knowledge that the pupils should acquire from a well-organized course 
in home mechanics. The test is divided into two closely equivalent 
forms, А and B. Each form contains 36 jobs, comprising a test of 
half the outstanding jobs in home mechanies. It is divided into 
Forms A and B so that it will be easier to administer and will more 
nearly fit the various needs of home mechanics teachers. 

Validity. Four criteria were used to establish the validity of the 
test: 


1. Surveys to check the jobs on the basis of social utility. 

2. Surveys of the actual teaching content of 75 representative 
schools. 

3. Analysis of course of study and widely used commercial job 
sheets which were based on surveys. 

4. Selection of jobs with procedures representative of a class of 
jobs rather than just a single job. 


Scoring. 'The test is objective in its scoring. Each form has a 
printed key in which the correct responses for Part I are placed around 
the margins and the correct diagrams for Part II are reproduced on 
the back. This scoring key contains full directions for its use. The 
key sheet requires no cutting and is easy to use. No corrections for 
chance are required. 

In Tables 19 and 20 are given statistical measures which indicate 
the consistency of measurement by the test. 


4 Newkirk, Louis V., and Stoddard, George D., Newkirk-Stoddard Home 
Mechanics Test, Bureau of Educational Research and Service, University of 
Iowa, Iowa City, 1928. 


STANDARDIZED INDUSTRIAL ARTS TESTS 67 


TABLE 19 


ReLI4BILITY OF A 51хоіє Form (A or B) 40 MINUTES’ TESTING 


t а 
р т кл P.E. Score 
Grade No. in В 
Sample 
Job Point Job Point Job Point 

7 50 49 86 25 26 12 6.6 

8 50 64 89 3.0 29 12 65 

9 50 59 85 3.2 24 14 63 

7-9 150 54 86 29 27 13 6.8 

TABLE 20 
RznraninirY or Born Forms (А + B) 80 Міхотеѕ’ TESTING 
andar 
| r a РЕ. Score 
Grade No. in 
Sample 
Job Point Job Point Job Point 

7 50 66 92 45 50 16 9.5 

8 50 78 94 5.5 56 15 9.3 

9 50 45 92 5.7 45 1.9 8.6 

7-9 150 70 93 54 53 20 94 


Norms. Table 21 recapitulates preliminary norms obtained from 


grades 7, 8, and 9. 
TABLE 21 


Norms, Forms A лхо B, May TESTING (N = 396) 


Jobs Points 


Form A Form B Form A Form B 


Mean 52 ze 147 146 
Median 48 5 779 730 
| 10 16 958 963 

Upper quartile -+--+ 
Lower quartile ....--+-- 32 3.1 60.5 545 
29 34 25.1 279 


Standard deviation 


68 INDUSTRIAL EDUCATION ACHIEVEMENT TESTS 


4. WELLS-LAUBACH INDUSTRIAL Arts TESTS 5 


Tests in woodwork, printing, machine shop, and mechanical draw- 
ing are included in this series. АП the tests are of the pencil-and- 
paper type and with the exception of one in mechanical drawing are 
made up of 100 true-false statements. Twenty-five minutes is the 
working time for the woodwork and printing tests, 20 minutes for the 
machine shop, and 30 minutes for mechanical drawing. 

Validity. The content of the tests parallels teaching practice in 
a general way but no scientific means of determining validity is re- 
ported. 

Reliability and Norms. No statistical data are given on the tests, 
but tentative norms based on the median accomplishment in about 
1000 cases are reported. The tentative norms are given on the basis of 
four semesters for each of the four tests. 


38. Non-Standardized Industrial Arts Tests. 


1. Hunter Өнор Tests, SERIES 1 AND 2° 


Hunter has developed 32 short objective tests. Each test includes 
25 objective questions on the particular subject measured. The fol- 
lowing is a list of these objective tests according to the subjects or 
parts of subjects tested. They are of the pencil-and-paper type. 


WOODWORK 
W-1 Tools test 
W-2  Fastenings test 
W-3  Comprehension test 
W-4 Trade names test 
W-5 Reading test for rule or scale 
W-6  True-false test 
W-7 Completion test 
W-8 Building parts test for carpentry 
W-9 Board measure test 
W-10 Multiple-choice test for wood and lumber 
W-11 Reading test for framing square 
W-12 Objective test for wood finishing 
W-13 Multiple-choice test for carpentry 
W-14 Multiple-choice test for pattern makers 


MECHANICAL DRAWING 
MD-1 Reading test 
MD-2  Missing-line test 
MD-3  Lettering test 
MD-4  True-false test 


5 The Manual Arts Press, Peoria, Illinois, 1928. 
The Manual Arts Press, Peoria, Illinois, 1927. 


MECHANICAL DRAWING TESTS 69 


MACHINE SHOP 


MS-1 Tool test 
MS-2 Comprehension test 
MS-3  True-false test 
MS- Micrometer reading test 
MS-5 Multiple-choice test 
ELECTRIC Supp 
E-1 Symbols test 
E-2 Objective test 
AUTOMOBILE MECHANICS 


AM-1 Parts test 
AM-2 Multiple-choice test 
PRINTING 
p-1 Completion test 
RELATED SUBJECTS 
G-1 Shop English 
G-2 Shop mathematics 
G-3 Shop arithmetic 
G4 Geometry 


dity has been determined in a general way from 


Validity. The vali 
Jed judgment, and analysis of textbooks and 


teaching experience, poo 


courses of study. 
Reliability and Norms. The reliability is not reported, and the 


tests are not standardized. They are too short to have very high 
reliability individually, but in using a battery of the woodworking 
tests the authors have found coefficients of reliability as high as .85. 


39. Mechanical Drawing Tests. 
Four tests in mechanical drawing are described in this section. 
1. BADGER STANDARD TrsT IN FUNDAMENTAL MECHANICAL 
Drawine, Tests 1, 2, at 


The author states that “these are tests of what the pupil knows 
about the phases of drawing covered rather than a test of his drawing 
ability measured in terms of neatness, accuracy, lettering and so forth.” 
The test includes 145 exercises of the multiple-choice type. The form 
has been varied to fit the types of content tested. There are three 
tests, The first deals with knowledge relating to the use of instru- 
ments, line work, dimensioning, and lettering; the second tests knowl- 


7 Public School Publishing Company, Bloomington, Illinois, 1929. 


70 INDUSTRIAL EDUCATION ACHIEVEMENT TESTS 


edge of projection and includes sections and auxiliary views; and the 
third measures knowledge of pictorial drawing, isometric, cabinet, and 
oblique. The test does not have time limits, but the directions sug- 
gest that the tests be collected after all but two or three of the slow- 
est pupils have finished. The validity and reliability of the test are 
not given in detail. 


2. Castie MECHANICAL DRAWING TEsT $ 


This test is divided into five subtests. Subtest 1 requires the pupil 
to identify similar parts of an object in top and side views by match- 
ing corresponding numbers and letters. Subtest 2 deals with dimen- 
sions; 3, with geometric terms; 4, with pencil technique; 5, with ink- 
ing. The working time for the test is 41 minutes. The first three 
parts are objective in scoring, but the last two depend to some extent 
on the teacher’s subjective judgment, although the scorer is provided 
with letter rating scales and six points are mentioned for rating the 
drawing. 

Validity. No very definite statement is given as to validity, but 
it is based on analysis of instructional materials and a long teaching 
experience in mechanical drawing. 

Reliability and Norms. The coefficient of reliability is not re- 
ported, and norms have not been established for the test. It is not 
standardized but should prove useful for measurement of drawing 
achievement in the same manner that a teacher-made objective test 
would be used. 


3. FISCHER MECHANICAL DRAWING Tests, PARTS I anp II? 


Part I of this test covers the technical information necessary in 
drawing. No instruments other than a pencil are needed. Part II is 
a performance test and requires the use of drawing instruments. 
Either test can be given in a 45-minute drawing period. Part I is 
composed of four subtests and Part II of three subtests. Both parts 
of the test should be given since it is desirable to test information 
and performance. Parts I and II are not equivalent forms but are 
divisions of the same test. The test has considerable diagnostic value 
as it enables the teacher to see where the pupils have sueceeded and 
where they have failed. The problems in the test are rated according 


to difficulty. The test includes a manual, scoring key, and a class 
record sheet. 


8 The Manual Arts Press, Peoria, Illinois, 1928. 
® The Bruce Publishing Company, Milwaukee, Wisconsin, 1929. 


MECHANICAL DRAWING TESTS 71 


Validity. The claim for validity is based on analysis of textbooks, 
blueprints, courses of study, and in addition on a survey of 100 schools 
to find out what was being taught, time being devoted to drawing, 
ete. This material was tabulated under five major divisions as fol- 
lows: use of instruments, lettering, projection drawing, geometric con- 
structions, and pictorial representations. The content included in the 
test was carefully validated on the basis of teaching practice and rep- 
resents good workmanship. 

Reliability. The coefficient of reliability was determined by giving 
the same test twice to 150 sophomores in high school. This resulted in 
a correlation coefficient of .79. This is quite low for a standardized 
test. 

Norms. Median-score graphs are given which indicate the median 
score for all schools as represented by scores from 2500 students. The 
norms or medians of accomplishment are classified on the basis of 
minutes of instruction. The author also suggests means of using bar 
diagrams and the use of test scores as a partial means of assigning 


marks. М 


4. Nasu-Van Duzee INDUSTRIAL ARTS TEST, Test П, 
MECHANICAL DRAWING ^° 


ed to measure objectively performance in draw- 
ing as well as information about mechanical drawing. The test is 
suitable for use in both the junior and senior high school. The test 
is divided into Part I and Part II and is available in two closely 
equivalent forms. Forms I and II were equalized on the basis of 
the results obtained from 500 mechanical drawing pupils in the ninth 
and tenth grades. А manual of directions, objective scoring key, and 
a class record sheet are provided. The scoring key also includes а 
scale for the rating of ability to letter. Part I of either form can be 
written in the ordinary classroom with a pencil, but Part II requires 
the use of mechanical drawing instruments. » 

Validity and Construction. The claims for validity are based on 
the analysis of textbooks, courses of study and reference books, and a 
rating of the analysis by several hundred persons interested in teach- 
i i rawing. 
к Ue reliability of the test, was found to be 87 with 
203 fifth- and sixth-semester pupils: Statistically, this is quite a sat- 
isfactory reliability since it was determined by correlating Form I 


with Form II. 


The test is design 


10 The Bruce Publishing Company, Milwaukee, Wisconsin. 


72 INDUSTRIAL EDUCATION ACHIEVEMENT TEST 


Norms. Median and percentile norms indicating accomplishment 
by semesters and minutes based on 2500 cases are given for the junior 
high school and the first two years of high school. A suggestive scale 
for converting percentile norms into equivalent class marks is given 
in the table of percentile norms. Percentile curves for Forms I and II 


are used to show the approximate equivalence of the two forms of the 
test. 


40. Trade Tests. 


Trade tests are of value to industrial education teachers who teach 
vocational courses, and they are very suggestive to teachers of the 
general educational courses of the junior high school. Trade tests 
measure trade proficiency; they are valuable in selecting men who 
possess the information and skill necessary to succeed in a given trade 
and for measuring accomplishment in advanced vocational courses. 

Chapman ™ has pointed out the significant distinctions between 
intelligence tests and trade tests. “While these two forms of the test, 
the mental test and the skill prediction test, both have a great sphere 
of usefulness in industry, it is very essential to precise thinking on 
the subject of industrial testing not to confuse these with the trade 
test proper. The trade test makes no pretense of measuring intel- 
ligence directly; it makes no attempt to measure the native endow- 
ment of the subject, with a view to predicting the degree of success 
to be expected as a result of training in a specific trade; the trade 
test furnishes a rating, in objective quantitative terms, of the degree 
of trade ability already possessed as a result of practice in the trade.” 

Trade tests present numerous difficulties in their construction and 
for that reason have not been entirely successful, although the better 
tests are decidedly superior to subjective judgments in selecting quali- 
fied tradesmen. One difficulty has been the lack of information of 
test workers about abilities, techniques, skills, and attitudes necessary 
for success in a given occupation to develop valid measures. Another 
difficulty is that trade tests are not always given under trade condi- 
tions, with the result that a man may succeed on the test but fail on 
the job. Trade tests are also expensive of time and money. Many 
of them are individual tests and require material and tools for the 
measurement of manipulative skills. 

Trade tests of four general types—oral, picture, performance, and 
written group tests—were widely used in the army during the World 
War to select men who were proficient in the various trades. This 


11 Chapman, J. С., Trade Tests, Henry Holt and Company, 


New York, Chap- 
ter XI, p. 374, 1921. 


SELECTED REFERENCES 78 


procedure saved considerable time and money. Since the war many 
industries have employed trade tests in selecting applieants for posi- 
tions. Tests of this type are used in vocational guidance. 'Trade 
tests have been greatly improved and modified during the past ten 
years and have been adapted to the needs of industry. 


SUMMARY 


Objective tests have appeared somewhat more slowly in industrial 
education than in certain other branches of instruction. Possibly this 
has been because of a lack of definiteness in the statements of the 
objectives of certain of the industrial courses. 

Achievement in the industrial subjects is not entirely a matter of 
information. Ability to perform a task does correlate with knowl- 
edge about the task, but this relationship does not seem to be suffi- 
ciently high to warrant the exclusive use of pencil-and-paper tests for 
the measurement of achievement in the industrial subjects. Accord- 
ingly, performance as well as informational types of tests are needed 


in this field. 

SUMMARY EXERCISES FOR DISCUSSION 

1. Discuss the special limitation 
subjects. 

2. Show how the fact thi 

from school to scho 


of tests. f . 
3. Select at least one test in the fields of woodworking, mechanical drawing, and 


shop work, and present the major values and limitations of each. 


s of paper-and-pencil iests in the industrial 


at industrial education courses are not well standardized 
ol accounts for numerous difficulties in the construction 


SELECTED REFERENCES 


the Fundamentals of Mechanical Drawing, 


Bapcem, Arex J., Standard Tests in à 4 iani 
Tests 1,2, and 3. Bloomington, Illinois: Public School Publishing Co., 1929. 
CASTLE, Drew W., Mechanical Drawing Test, Peoria, Illinois: Manual Arts Press, 


1927. Я 
Caste, Drew W. «Mechanical Drawing Tests,” Vocational ‘Education Magazine, 


Vol. 2: 756-8, May, 1924. — 
Curisty, Ermer W. Mechanical 

Arts Press, 1926. ө М . 
CLEETON, GLEN U, “Printing Tests for the Junior High School," Industrial Arts 
‘ ү Vol. 19: 329, September, 1930. 


and Vocational Education Magazine, ў 
Donson, Сіковов C., “А Machine Shop Test," Industrial Arts and Vocational 
Я 


Education, Vol. 20: 132-3, April, 1931. И | | 
Fiscuer, FERDINAND A. P. Mechanical Drawing Tests, Milwaukee, Wisconsin: 


Bruce Publishing Company, 1930. 
ЕгАнЕвтү, Epwarp B., “PBlectrical Shop Tests for Grades 7, 8 and 9,” Industrial 


Arts and Vocational Education, Vol. 19: 142, April, 1930. 


Drawing Scale, Peoria, Illinois: The Manual 


74 INDUSTRIAL EDUCATION ACHIEVEMENT TESTS 


Fram, Avausr, “First-Year Mechanical-Drawing Test,” Industrial Arts Magazine, 
Vol. 17: 223, 336, June and September, 1928. 

Fram, Avausr, “Second-Year Mechanical-Drawing Test,” Industrial Arts Maga- 
zine, Vol. 17: 371, October, 1928. 

Fram, Avcvust, “First-Year Mechanical-Drawing Multiple-Choice Test,” Indus- 
trial Arts Magazine, Vol. 17: 70, February and March, 1928. 

Fram, Aucust, “Some Mechanical Drawing Tests,” Industrial Arts and Voca- 
tional Education, Vol. 19: 150, April, 1930. 

HaznrsrEDT, W. G., “Architectural Drawing Test,” Industrial Arts and Vocational 
Education, Vol. 20: 141, April, 1931. 

Hyertstevt, W. G., “Auto Mechanics Test,” Industrial Arts Magazine, Vol. 
18: 239-40, June, 1929. 

Hornina, S. D., “Tests of Prognostic Value in Mechanical Drawing," Industrial 
Education Magazine, Vol. 29: 441-4, June, 1928. 

Hunter, Wm. L., Shop Tests. Peoria, Illinois: Manual Arts Press, 1927-1930. 
Woodworking Tests, Mechanical Drawing Tests, Machine Shop Tests, Elec- 
trical Shop Tests, Auto Mechanics Tests, Printing Test, Related Subjects 
Tests. 

Jounson, H. J., “Electrical Tests,” Industrial Arts and Vocational Education, Vol. 
20: 139-40, April, 1931. 

Knorr, Harry W., “A Mechanical Drawing Test,” Industrial Arts and Vocational 
Education, Vol. 20: 127-9, April, 1931. 

Моввасн, Nevson J., "Sample Woodworking Tests,” Industrial Arts and Voca- 
tional Education, Vol. 20: 129-30, April, 1931. 

Nasu, Harry B. and Van Duzer, Roy R., Industrial Arts Tests. Milwaukee, 
Wisconsin: Bruce Publishing Company. (Test I Scale A—Woodwork Infor- 
mation, 1928, Test I Scale B—Woodwork Performance, 1929, Test II Mechan- 
ical Drawing, Forms I and II, 1930. Instructional Review Tests in Mechan- 
ical Drawing, 1930.) 

Newxmk, L. V., and Sropparp, G. D., Home Mechanics Test. 
Bureau of Educational Research, 1928. 

Weaver, C. G., “Trade Tests: Their Construction, Use and Possibilities in Indus- 
try,” Industrial Arts Magazine, Vol. 10: 163-6, May, 1921. 

WELLS, С. K, Shop Tests. Peoria, Illinois: Manual Arts Press, 1929, 


Towa City, Iowa: 


CHAPTER VIII 


INTELLIGENCE AND APTITUDE TESTS IN INDUSTRIAL 
EDUCATION 


I MEASUREMENT OF INTELLIGENCE 


41. Meaning of Intelligence. 

The exact nature of intelligence is not well understood, but it is 
definitely known that individuals vary quite widely in mental ability 
and that within limits it can be measured. Authorities in the field of 
mental measurement are far from agreement as to what the term in- 
telligence implies. Some consider that intelligence is best indicated 
by the ability of the individual to solve problems, to adapt himself 
to new situations. Others hold that the abilities to perceive with speed 
and accuracy, to associate symbols, to manipulate abstract concepts, 
and to reason, are the best evidences of intelligence. Facility in the 
use of language itself is considered by some to be one of the very sig- 
nificant evidences of intelligence. For the purposes of this discussion 
intelligence will be considered as the capacity for lcarning, plus the 
informations, skills, and attitudes which the individual has gained 
from reacting to his environment. This rather liberal conception of 
intelligence permits it to fit readily into its place in the educational 
program and also places in an acceptable light the majority of devices 


for the measurement of general mental ability. 


42. Measurement of General Mental Ability. 

Teachers and educators in general are aware that at least two 
related but different phases of intelligence must be taken into account 
in adequate mental measurement. Certain individuals react readily 
to abstract stimuli and thus are frequently rated as normal or even 
superior on the basis of mental tests in which abstract material pre- 
dominates. Other types of individuals do not respond to abstraetions 
but reveal unusual aptness in reacting to concrete and tangible ma- 
terial. Stenquist? has stated the case for this type of pupil most 
convincingly, and has presented a very useful supplement to the 

«A Case for the 
November, 1921. 
75 


18tenquist, John L. Low IQ.” Journal of Educational 
Research, Vol. 4: 241-54, 


76 INTELLIGENCE AND APTITUDE TESTS 


ordinary abstract type of mental measurement in his Mechanical 
Aptitude Tests. 

It is true that much remains to be learned about intelligence and 
its measurement. There are those who argue that measures of intel- 
ligence should not be used because its exact nature is not known. 'This 
argument is no more valid than the statement that electricity should 
not be measured or used because its exact nature is not known. In- 
telligence, like electricity, can be measured to the distinct advantage 
of society if the results are properly used. Unquestionably the scores 
from mental tests do not reveal intelligence as exactly as the dials of 
an electrie meter indieate the number of watts of electricity con- 
sumed. Nevertheless, a few reasonably reliable and valid measures of 
intelligence are available for general use. Odell? states that at least 
two hundred tests of mental ability have been constructed since the 
early work of Binet, and that approximately one hundred are still 
available for use. 


43. Methods of Measuring Intelligence. 


Intelligence tests are of two general types, individual mental ex- 
aminations and group tests of mental ability. Individual mental ex- 
aminations are thought to be considerably more valid, and because of 
the method of administering them they are probably more reliable 
than group tests. Individual examinations are expensive since they 
are given to only one subject at a time and since they should prefer- 
ably be given only by a trained examiner well grounded in psychology. 
Much of the significance of the individual examination lies in the in- 
terpretations of the subject’s reactions by the examiner as the stimuli 
are presented. Group tests are easy to administer, some being almost 
self-administering. 

The problems of measuring intelligence commonly met by the 
teacher of industrial education can usually be handled satisfactorily 
by the use of carefully selected group tests. However, the group test 
results should almost certainly be supplemented by the individual 
mental examination for those having very high or very low scores and 
for problem cases. Where this is not possible, or where the problem is 
not extremely serious, the use of two or even three group mental tests 
is to be recommended. The average of the mental-age scores obtained 
from two or three group tests is a much more accurate measure than 
that ordinarily obtained from a single testing. 

Since the problems of the industrial education teacher will ordi- 


? Odell, C. W., Educational Measurements in High School. Ch: 
, Й . t h 
391. The Century Company, New York, 1930. мазир 


GROUP TESTS OF MENTAL ABILITY Ti 


narily be solved by the use of a good group test, and since most class- 
room teachers are not adequately trained or experienced in the use of 
the individual examination, a brief list of excellent group tests is 
presented for detailed consideration. A short description and evalua- 
tion of the most widely used individual mental examinations is given 
in a later section in this chapter. 

44. Group Tests of Mental Ability. 

The four group tests of mental ability selected for description and 
evaluation here are chosen from an extensive list of such tests. These 
tests are suitable for use in grades VII to XII, inclusive. Each test 
alidated and ranks high among such tests for 
the reliability of the test forms themselves as well as for the reliability 
of the age norms used as the basis of interpretation. These tests can 
be readily administered to a group of any number of pupils. The 
results are comparable within reasonable limits to those obtained on 


the individual examinations. 


has been carefully v 


1. KUHLMANN-ANDERSON ÍNTELLIGENCE Tests ê 


Anderson Intelligence Tests are the result of more 
areful researeh by both authors working in the 
Minnesota State Board of Control. The 
e battery are arranged in a scale of 
lts of which closely approximate the 
al examination. The tests in 
d in the booklets as shown 


The Kuhlmann- 
than ten years of c 
Research Division of the 
thirty-nine tests comprising th 
overlapping units, the net resu 
results from any good individual ment 
their most recently revised form are arrange 


in Table 22. 
TABLE 22 


ARRANGEMENT OF KuHLMANN-ANDERSON TESTS 


Tests Age When Test 


School Grade uu 

I First semester eetet 1-10 6-0 

I Second semester .. 6 4-13 66 

ЇЇ SL east 8-17 7-6 

EE siseringi pite etm nome iie REESE 12-21 8-6 

IV . ... 15-24 9-6 

W' Kase s 18-27 10-6 

TI ssa raê Eee ASS RSE 22-31 11-6 

VEVE КККК ык nice Ee 25-34 13 

30-39 15-6 


IX-XII andadulb....8 


, Rose, Kuhlmann-Anderson Intelligence Tests, 


з Kuhlmann, F., and Anderson. 
polis, Minnesota, 1927. 


Educational Test Bureau, Minnea| 


78 INTELLIGENCE AND APTITUDE TESTS 


A somewhat novel procedure is used in interpreting the results of 
these tests. Each of the ten tests comprising a booklet is standardized 
separately. The test is scored in terms of the number of exercises 
answered correctly. By referring to a table of norms the mental age 
of the individuals making such a score is obtained. А mental-age 
score for each of the ten tests is thus obtained, the final mental-age 
score assigned to the pupil being the median of the resulting mental 
ages. This procedure appears to result in unusually reliable measure- 
ment of mental ability.* 


2. OTIS Group INTELLIGENCE SCALE ° 
ADVANCED EXAMINATION, Forms A, B 


This was one of the first tests to appear for measuring intelligence 
at the secondary-school level. It has been widely used in Grades VII 
to XII inclusive. It is composed of ten divisions as follows: following 
directions, opposites, disarranged sentences, proverbs, arithmetic, 
geometric figures, analogies, similarities, narrative completion, and 
Amemory. The test requires more than an hour to give; it has 230 
test elements with an actual working time of about 45 minutes. The 
coefficient of reliability for grades and half grades is around .84 and 
around .97 for all grades combined. The test correlates approxi- 
mately .75 with a suitable criterion. 


3. SELr-ApwiNISTERING TEST or MENTAL ABILITY 5 
Hicuer Examination, Forms A, B, C 


This test is unique in that it requires a minimum amount of in- 
struction from the examiner. For this reason industrial education 
teachers will find this a very satisfactory test to use. The test has two 
time limits of 20 and 30 minutes. Generally the 30-minute time is 
satisfactory except possibly in the last years of the senior high school. 
The reliability of the test is reported as .92, and it has a high correla- 
tion with a valid criterion. 


4 Kuhlmann, F., “A Median Mental Age Method of Weighting and Scaling 
Mental Tests,” Journal of Applied Psychology, June, 1927, 

Pintner and Patterson, A Scale of Performance Tests, 1917. 

5 Otis, A. S, Group Intelligence Scale, World Book Company, Yonkers-on- 
Hudson, New York, 1918. 

6 Otis, A. S., Self-Administering Test of Mental Ability, World Book Com- 
pany, Yonkers-on-Hudson, New York, 1922. 


INDIVIDUAL MENTAL EXAMINATIONS 79 


4. TrRMAN's Group Test or MENTAL ABILITY * 
Forms A anp B 


The Terman test is composed of ten divisions as follows: informa- 
tion, best-answer, word meaning, logical selection, arithmetic, sen- 
tence meaning, analogies, mixed sentences, classification, and number 
series. Two approximately equal and interchangeable forms are avail- 
able. There are 185 items in each form of the test. It can be given in 
40 minutes, although the actual working time is only 27 minutes. 
The reliability of the test is approximately .90. It correlates .75 
with a suitable criterion of mentality. Complete tables of mental-age 


norms are given in the examiner's manual. 


45. Individual Mental Examinations. 

ations probably constitute the most ac- 
curate devices for the measurement of intelligence. The length of the 
test, the wide variety of reactions called for, the fact that the subject 
receives his instructions personally from the examiner, and the fact 
that this affords the examiner an opportunity to observe each reaction 
of the subject all combine to account for this greater accuracy of 
measurement. However, this greater accuracy is compensated for by 
greater expense in the administration of the tests, which operates in 
terms of both time and money. In fact, it is thought by many that 
this expense item is so great that in most classroom and shop situa- 
tions the resulting increase in accuracy of measurement is not com- 
mensurate. Accordingly, it is quite probable that teachers of in- 
dustrial education will find it desirable to initiate their analysis of 
problem cases by first using good group tests of mental ability. In- 
dividual mental examinations may be given later to a relatively small 
number of pupils who deviate most widely from the normal. 

A very simple procedure will reveal directly to the teacher the 
special individuals who should receive further attention. If the group 
mental-test scores for the entire class are tabulated in a frequency 
table, or if the test papers themselves are merely arranged in descend- 
ing order of size of the test scores, the individual pupils deviating most 
widely from the average for the class and from the normal mental age 
for the grade will be revealed. Thus, it may be necessary to retest 

] mental examination to) only a small per- 


(or to give the individua Am А 
кш the group. After all, it is in the highest and lowest ex- 


Individual mental examin 


1Terman, L. M, Group Test of Mental Ability, World Book Company, 


Yonkers, New York, 1920. 


80 INTELLIGENCE AND APTITUDE TESTS 


tremes of intelligence that the problem cases arise, and it is also in this 
same group that the most serious errors or misinterpretations are 
likely to take place. Most present-day tests of mental ability aecom- 
plish a reasonably accurate placement of the more nearly normal 
group. 


STANFORD Revision or BINET-SIMON INTELLIGENCE TESTS $ 


This extensive mental scale includes groups of tests suitable for 
the measurement of mental ability from three years to fourteen in- 
clusive as well as tests suitable for average adults and superior adults. 
There is a complete manual of directions, stating in detail just how to 
administer and score the test. The reliability of the test is around 
95, and its validity is commonly considered as a standard in con- 
structing other group and individual tests of intelligence. The validity 
of the test has been frequently criticized because it is composed of so 
much verbal material, but this same criticism can be applied to many 
intelligence tests. This is one of the most widely used of the individual 
tests of intelligence for classroom use. 


46. Results of Mental Measurement. 


Intelligence-test scores should be interpreted with great care. After 
all, such scores are only estimates of intelligence and should not be 
considered absolute measures. The individual is very complex, and 
many factors may affect the rating of the pupil. In the first place, 
the differences in environment and in training opportunities are fre- 
quently overlooked in interpreting mental-test scores. The intelligence 
of different individuals may legitimately be compared only when there 
is assurance that the learning opportunities have been the same. This 
fact, incidentally, is difficult to establish. Accordingly, few such 
comparisons are legitimate. Numerous other factors such as the in- 
dividual’s inability to see, to read, to hear, or some other temporary 
physical disability may seriously influence the score. Errors in the 
administration of the tests, errors in scoring, and clerical errors in 
transcribing and in computing results must be guarded against at all 
times. The shop teacher should be especially critical of very high 
and very low scores since they are the ones most likely to be in 
error. Every very low and every very high score should be carefully 


| 8 Terman, L. M, et al, The Stanford Revision and Extension of the Binet- 
Simon Scale for Measuring Intelligence, Warwick and York, Baltimore, 1917. 


"Test material also through Houghton Mifflin Company, Boston, and C. H. 
Stoelting Company, Chicago. 


INTELLIGENCE QUOTIENTS 81 


rechecked by other group tests or by individual examinations before 
any serious administrative or instructional adjustments are made. 


47. Mental-Age Score. 

The results of mental testing which are of most use to the class- 
room teacher are the mental-age scores. These scores are derived 
from the raw test scores and afford the basis for the caleulation ofa 
number of useful derived scores called quotients. Mental-age norms 
for an intelligence test are commonly secured by administering the 
mental examination to large numbers of individuals of various age 
levels. After the tests have been scored Ше papers are usually as- 
sembled by age groups, all the nine-year-olds being placed in one 
group, all the ten-year-olds in another, ete. In this way the typical 
Scores to be expected of individuals of different chronological ages 
may be determined. If the typical point score of a large group of ten- 
year-olds should be 126 on a given mental test, thereafter апу in- 
e of 126 points would be designated as having 
Certain of the more carefully validated 
have established their mental-age norms 
arge numbers of individual mental ex- 


dividual making a scor 
a mental age of ten years. 
and standardized group tests 
on the basis of results from 1 
aminations. 
48. Intelligence Quotients. 
Mental-age scores make possible the derivation of a series of 
quotients which are very useful in the interpretation of mental- and 
achievement-test results. The most commonly used quotient of this 
type is the intelligence quotient or LQ. The LQ. is the ratio of the 
mental age to the chronological age of the individual tested. The 


formula for the LQ. is LQ. =МА., in which ће М.А. is the mental 


age and the C. A. is the chronological age; both expressed in months. 


The LQ. itself expresses the relative mental development of the in- 
dividual. If the pupil makes a score on the mental test which gives 
him a mental age identical with his life age his resulting I.Q. is 1.00, 
or 100 as it is usually expressed. A pupil is commonly considered 
normal if his I.Q. falls between 90 and 110. Intelligence quotients of 
110 to 120 are above average, and above 120 are superior. Quotients 
above 130 approach the genius class, and quotients of 140 to 150 


indicate most unusually accelerated mental development. Similarly, 
quotients of 80 to 90 are below average. Individuals with quotients of 
less than 80 are poor and may be expected to encounter much diffi- 

aterial at the junior- and senior- 


culty i astery of abstract m | 
ы k tients of 70 to 80 are very low, and quotients 


high-school level. Quo 


2 INTELLIGENCE AND APTITUDE TESTS 


yelow 60 indicate exceedingly retarded mental development bordering 
jn the moron level and idiocy. 

In the interpretation of the quotients derived from group-mental 
est scores it should be remembered that such quotients represent in- 
lividual interpretations, whereas, as a matter of fact, the tests on 
vhich they are based are group tests. In general, intelligence quotients 
ased on group-test results should not be utilized for any serious pur- 
ose on an individual basis. 

Intelligence-test scores are generally regarded as professional in- 
ormation to be used in teaching and guidance, but not to be given to 
he pupil or his parents. Long experience with intelligence tests has 
oved this to be a wise policy. The scores from such tests are sug- 
restive to the teacher and should be used only as indicative of ca- 
j3icity. When these ratings are given to the layman he is likely to 
ook upon them as final rather than suggestive and fail to interpret 
hem in the light of a professional background. There may be rare 
'ccasions when it is feasible to give this information to a pupil or 
iis parents provided it will aid the pupil or parent in a better under- 
tanding of the pupil’s possibilities of accomplishment or his future 
reeds. It is possible sometimes to encourage a pupil who is doing poor 
vork and is discouraged by pointing out to him that he has average 
lative ability and ean succeed with proper application. Occasionally 
t may be well to point out to a dull pupil that he is doing, well in the 
ight of his ability. It may be that a lazy, bright pupil can be mo- 
ivated by pointing out his failure to capitalize his real ability. Oc- 
'asionally, parents who punish their children for low marks can be 
nade a little more lenient by showing them that their children are 
loing well for their ability. These suggestions must be used with ex- 
reme tact and care or they will prove destructive rather than con- 
tructive. 

The shop teacher must be careful to distinguish clearly between 
ntelligence-test scores and tests of achievement. The intelligence test 
ives an indication of the student's capacity for acquiring information 
argely through the use of abstract processes. The achievement test 
vids in the measurement of actual accomplishment in the class and 
an be used as a partial basis for assigning shop marks. The in- 
elligence-test score is of value in teaching and guidance; it should 
iot be used directly in marking achievement inasmuch as intelligence 
ests are not measures of achievement in specifie courses, 


MEASURING MECHANICAL APTITUDE 83 


IL MEASUREMENT OF SPECIAL APTITUDES 


49. Measurement of Special Abilities. 

The measurement of general mental ability suggests the possibility 
of securing objective evidence of special types of abilities or aptitudes. 
This is a ficld of measurement in which all teachers in the secondary 
schools should be interested, for adequate educational guidance is be- 
coming more and more important at these levels in our educational 
program. Educational and vocational misfits, the high pupil mortality 
in certain of our courses, the heavy teacher-burden caused by increas- 
ingly large classes, as well as the general embarrassment to the school 
resulting from the misapplication of abilities, all demand that more 
attention be given to this phase of educational measurement. In- 
dustrial education teachers, because of their proximity to the problems, 
represent a group which should be greatly interested. 


50. Measuring Mechanical Aptitude. 

Ап aptitude may be thought of as the capacity of an individual 
for the development of some special ability or skill. Mechanical 
aptitude is the capacity of an individual to deal successfully with me- 
chanieal devices, and to acquire the knowledge essential to their selec- 
tion and operation after suitable training has been given. An in- 
dividual who has a large measure of mechanical aptitude, other things 
being equal, will readily sueceed when given instruction. Оп the other 
hand, an individual with low mechanical. ability is likely to fail 
regardless of the instruction or opportunities given to work with 
mechan Е 
ша» chanieal aptitudes is more ob- 


The importance of identifying me 
vious when it is realized that at least 40 per cent of the gainfully em- 


ployed population in the United States is dependent in some measure 
for its economic success оп the possession of mechanical ability. It 
is true, of course, that mechanical ability is only one factor in success 
even in mechanical pursuits, but it is also true that it is quite an im- 
portant factor. The industrial education teacher should keep this 
clearly in mind and should blend other important guidance informa- 
tion with the-evidence of the pupil’s mechanical ability. 


It thus becomes apparent that a knowledge of the student’s me- 
ant to industrial education teachers from 


chanical ability is import a | i 

both the guidance and the instruetional points of view. Knowledge of 

the fact that an individual has low or high ا و‎ aptitude gives 
i EAR à jective basis for guiding the 

t i tion teacher an objective Dé 

he industrial educatio: л лк. 


ils i i hich i 
pupils into or out of vocations W 1 : 
abilities T enables the teacher to assign projects better adapted to 


84 INTELLIGENCE AND APTITUDE TESTS 


the individual differences of the pupils in the class. Such à knowledge 
is of value to trade- or continuation-school teachers in selecting in- 
dividuals who are likely to profit by the instruction offered. How- 
ever, it is well to bear in mind that mechanical-ability tests must be 
carefully administered and interpreted, and that, at best, they are 
merely very suggestive and should not be considered infallible. 

It is widely known that individuals vary in mechanical ability. 
It is also known that mechanical ability does not correlate highly 
with intelligence, the quotient usually being around +.40. Stenquist 
pointed this out a number of years ago. This does not mean that 
many individuals with high intelligence as measured by intelligence 
tests do not have high mechanical ability, nor does it mean that in- 
dividuals with low intelligenee always have high mechanical ability. 
It strongly suggests that there may readily be a concrete aspect of 
intelligence which is necessarily an accompaniment of intelligence of 
the abstract type. Paterson, Elliott, et al. report that there is а 
fairly uniform tendency for test scores on mechanical ability to in- 
crease with chronological age between eleven and twenty. The same 
authors found little support for the supposition that men excel women 
in mechanical ability. The only test in which men and boys clearly 
excelled was in the Minnesota Assembly Test, and this was probably 
due to greater experience with the materials. Judging from the data 
available, engineering students are not superior to liberal arts students 
in innate mechanical ability. This emphasizes the fact that guidance 
is made up of many important factors and even in engineering colleges 
mechanical ability is not an infallible guidance factor. 

Industrial education teachers can readily see the guidance value of 
tests of mechanical ability, and fortunately some good tests are avail- 
able for use. The three measures of mechanical ability described and 


discussed in the following pages deserve careful study by shop and 
drawing teachers. 


1. MINNESOTA МЕСНАМІСАІ-Авилтү Tests 10 


'The Minnesota Mechanical Ability Tests are the outcome of re- 
search at the University of Minnesota during the years 1923-1927. 
They are probably the most carefully prepared tests of mechanical 
aptitude that have been published for general use. The tests are quite 

?Paterson, Elliott, Anderson, Toops, Minnesota Mechanical Ability Tests; 
University of Minnesota Press, Minneapolis, Minnesota, 1930, pp. 282-284. 

10 Materials for {һе Minnesota Mechanical Ability Tests may be obtained 


from the Marietta Apparatus Company, Psychological Equipment, Marietta, 
Ohio. 


MEASURING MECHANICAL APTITUDE 85 


reliable, and their validity has been carefully checked against objective 
criteria. $ 
When administered according to directions the Minnesota Me- 
chanical Ability Tests will give results which will be very useful in 
teaching and guidance. The battery includes the following six tests: 


' (a) Minnesota Paper Form Board, Series А and B, for either 
group or individual testing. ‘ 
(b) Minnesota Spatial Relations Test (individual test). 
Boards A and B. 
Boards C and D. 
(c) Minnesota Assembly Test (group or individual). 
Long form. 
Box A, B, and C. 
Short form. 
Box 1 and 2. 
(d) Minnesota Interest Analysis Test. 
(e) Packing Blocks Test. 
(f) Card Sorting Test. 


The authors report the following coefficients of reliability and 


validity for these tests. 
TABLE 23 


Iry AND VALIDITY ох MINNESOTA MECHANICAL 


COEFFICIENTS OF RELIABIL! 
Авплтү Tests 11 


Validity Coefficient 


Test Tn* (Between test and 

quality criterion) 
Minnesota Assembly, Boxes А, B, С........ 94 * 55 
Minnesota Paper Form Board, А and B..... . 90 52 

Minnesota Spatial Relations, Boards A, B, 

Q D лаа iecit S mem ere 84 E 
Packing Blocks . اده‎ T 26 
90 19 


Card Gortbg (4:0 ctm mme 


*rjjstepped up by Spenrman-Brown formula. 


The manual of directions gives instructions for administering, scor- 
ing, and interpreting results. The authors state that the examiner need 
not be a trained psychologist to administer the tests, but that he 
should be thoroughly familiar with the test to give it successfully. 


11 Op. cit. 


86 INTELLIGENCE AND APTITUDE TESTS 


POTE REX 


wre. 


MEASURING MECHANICAL APTITUDE 87 


The test is quite elaborate and rather expensive in time and money, 
but the combined battery will yield a satisfactorily valid and reliable 
score for most guidance purposes. 


2. SrENQUIST's ASSEMBLING TESTS OF GENERAL MECHANICAL 
ABILITY ** 


This is one of the first assembling tests of mechanical ability to be 
widely used. The test is composed of a small rectangular box divided 
into ten compartments. Each compartment contains a small me- 
chanical device which is common in the experience of most people. 
Some of the items selected by the author are a mouse trap, push 
button, simple lock, and bicycle bell. These mechanical devices are 
arranged in order of difficulty of response. In scoring, ten points are 
allowed for each device, and the score on each item is the number 
of correct points in assembling the device. Stenquist also suggests a 
short method in which just the devices almost or completely assembled 
are counted and given as the score. The second method would be 
less reliable because it disregards partial accomplishment. The test 
also gives a bonus of one-half point per minute for each minute under 
30 minutes in responding to the test. - 

According to the literature this test has a reliability of .70 and 
correlates with intelligence-test scores about +.20 to +.30. The 
test measures certain aspects of mechanical ability, but has too low 
а reliability to be used with assurance in measuring mechanical ability 
in individual cases. It correlates with teachers’ marks in mechanical 
subjects as high as +.80. Paterson, Elliott, and others found that 
the test correlated +.26 with the objective criterion used in the 
validation of the Minnesota Tests of Mechanical Ability. They also 
found that, by increasing the length of the test, its reliability and 
validity were greatly improved. Norms for different age groups are 


given in the accompanying table. 


TABLE 24 
PERCENTILE NORMS 


Percentiles 

Жен 5 10 25 50 75 90 95 

i bi d 2 2 J 
Fifteen 1.0 17 3.2 46 6. 5 19 
i 14 24 43 60 74 80 
Fourteen 10 ie 

i 15 25 39 53 66 4 
"Thirteen 10 е 2 us Е $8 

Twelve Л 10 1. 2 4 E 4 


12 C. H. Stoelting Company, Chicago. 


І 


88 INTELLIGENCE AND APTITUDE TESTS 


3. SrENQUIST MECHANICAL APTITUDE TESTS 
Tests I anp II 


These pencil-and-paper tests of mechanical aptitude are made up 
of pictures of mechanical things which are common in the experience 
of most people. The tests are not two equivalent forms, but the two 
parts (I and IT) are to be used to supplement each other. In general 
the student taking the test has to recognize mechanical things that 
belong together or work together and answer questions about parts 
or operations of machines. The working time on the first test is 45 
minutes, and on the second 50 minutes. The two forms together re- 
quire 173 responses. 

The reliability of the test appears to be around .75. Paterson 
and Elliott have shown that this can be increased to .89 or .90 by 
increasing the length of the test. The validity as checked by the best 
known criterion is lower than the assembly test even when the reli- 
ability has been corrected. Correlation with the objective criterion of 
the Minnesota tests is around +.30. The test correlates as high as 
+.60 with teachers’ marks in shop courses. 


Ill. SUMMARY 


The methods of measuring general and special types of mental 
abilities are discussed in this chapter. Intelligence, as treated in this 
diseussion, is considered to be the capacity for learning, plus the in- 
formation, skills, and attitudes which the individual has gained from 
reacting to his environment. Certain individuals react readily to 
abstract stimuli; others respond most readily to conerete and tangible 
situations. For this reason there seems to be a real need for both 
abstract and concrete types of mental stimuli. 

Intelligence tests are commonly classified as individual mental ex- 
aminations and group tests of mental ability. The results of mental 
testing which are of most use to the classroom teacher are the mental- 
age scores. These and all other scores derived from mental tests 
should be regarded as professional information of the most confidential 
type and used accordingly. 

Ап aptitude is the capacity of an individual to develop special 
abilities or skills. Mechanical aptitude represents the potential ability 
of the individual to deal successfully with mechanical devices, and the 
knowledge essential to the selection and operation of such devices 
after a suitable period of training. An early knowledge of special 


SELECTED REFERENCES 89 


aptitudes on the part of individual pupils is of great importance to 
the teacher of industrial education courses. 


SUMMARY EXERCISES FOR DISCUSSION 


1. State what seems to you to be a practical and accurate definition of intelli- 


gence. 
2. List the outstanding advantages and disadvantages of group mental tests. 


3. What are the major advantages and disadvantges of the individual mental 
examination over the group test? 

4. What is the difference between the mental age and the LQ.? 

5. Show how the shop teacher needs mental-test results as а protection against 
the possible misinterpretation of achievement-test results. 

6. What is the basis for the statement that aptitude tests are tests of special 
types of intelligence? Is it true? 

7. Why should the shop teacher be especially concerned with results from apti- 
tude tests? 

8. From available so 
tests which would seem to pr 
industrial education teacher. 


urces make а list of the mental tests and special-aptitude 
ovide the most useful information for the 


SELECTED REFERENCES 


L. A, "General Intelligence, Machine Shop 


Bm, Verne A. and PECKSTEIN, 
High School,” School Re- 


Work, and Educational Guidance in the Junior 
view, Vol. 29: 782-6. 


Baxer, Harry J., and CROCKETT, A. С., Detroit Mechanical-Aptitude Examina- 


tion. Bloomington, Illinois: Public School Publishing Company, 1928. 
Волар, Epna; MARSH, Wirta; and STOCKWELL, LYNN E. “Relation of General 
Intelligence to Mechanical Ability,” Industrial Arts Magazine, Vol. 16: 330-2, 


September, 1927. 
Carrenter, J. B., “The Function o! 
Supervision of a Vocational School,” 
2: 65-6, September, 1923. M m 
Freeman, F. N. Mental Tests, Their History, Principles and Applications. 


Boston: Houghton Mifflin Company, 1926. | | 
FRYKLUND, Verne C. "Intelligence and the Shop," Industrial Arts Magazine, 
, VERA i 


Vol. 17: 81-3, March, 1928. | | | 
Соврох, Geo. Th, “Relation of Pupils’ Intelligence Quotients to Their Grades 
in "High " School Shops," Industrial Education Magazine, Vol. 30: 249-50, 


Janus 928. А . 
Buen M. “Intelligence and Shop Accidents,” Industrial Arts Magazine, 
Vol. 18: 265-6, August, 1928. | к А 
Неввіхо, J. P., Herring Revision of the Binet-Simon Tests. Yonkers, New York: 

World Book Company, 1922. 
Huu, C. L., Aptitude Testing. Yonkers, New York: World Book Company, 
5 C ba 
1928. 
KEANE, FRANCIS L, and O 
tude,” Personnel Journa: 


f Mental Tests in the Administration and 
Vocational Education. Magazine, Vol. 


CONNER, JOHNSON, “A Measure of Mechanical Apti- 
l, Vol. 6: 15-24, January, 1927. 


90 INTELLIGENCE AND APTITUDE TESTS 


Конімахх, F., and ANDERSON, Rose, Examiners Manual for Kuhlmann-Ander- 
son Intelligence Tests. Minneapolis: Educational Test Bureau, Inc., 1927. 

MacQuarrie, T. W., A Test for Mechanical Ability. Hollywood: Southern Cali- 
fornia School Book Depository, 1925. 

Mapsen, I. N., “The Contributions of Intelligence Tests to Educational Guid- 
ance in High School,” School Review, Vol. 30: 692-701, November, 1922. 
Meter, N. С., and Ѕклѕнове, C. E., Art-Judgment Test. Iowa City, Iowa: Bureau 

of Educational Research, University of Iowa, 1929. 

Minnesota Paper Form-Board Test, Series A and B. Marietta, Ohio: Marietta 
Apparatus Company, 1920. 

Oris, Artuur S., Group Intelligence Scale, Examiners Manual. Yonkers, New 
York: World Book Company, 1918. 

O'Rourke, L. Ј., Mechanical Aptitude Test. Washington, D. C.: The Educa- 
tional and Personnel Publishing Company, 1926. 

Proctor, W. M., “Psychological Tests and Guidance of High School Pupils,” 
Journal of Educational Research Monographs, No. 1. Bloomington, Illinois: 
Public School Publishing Company, 1923. 

Reepy, CAROLINE M., “Can Intelligence Tests Help to Solve the Continuation 
School Classification Problem?” Vocational Education Magazine, Vol. 2: 
151-5, October, 1923. 

Ruru, Norton W., Electrical-Inclination Test. Chicago: C. H. Stoelting Com- 
pany, 1927, 16 pages. 

SEASHORE, С. E., and others, “Mentality Tests: а Symposium," Journal of Edu- 
cational Psychology, Vol. 7: 229-40; 278-80; 348-60; April, May, June, 1916. 

Smitu, Н. L., and Wricut, W. W., Tests and Measurements, Chapter XX, 
“Prognosis and Special Ability Tests.” New York: Silver Burdett and Com- 
pany, 1928. 

Srenquist, JOHN L., Assembling Tests of General Mechanical Ability, Series I, 
II, and III. Chicago: C. H. Stoelting Company, 1922. 

SrENQUIST, JOHN L., Mechanical Aptitude Tests. Yonkers, New York: World 
Book Company, 1927. 

Srov, E. G., “Tests for Mechanical Drawing Aptitude," Personnel Journal, Vol. 
6: 93-101, and 361-6, July, 1927, October, 1927. 

SUTHERLAND, Б. 5., "Correlation Between Intelligence and Skill in Shop Work," 
Industrial Arts Magazine, Vol. 17: 204-5, June, 1928. 

Terman, L. M., The Intelligence of School Children. Boston: Houghton Mif- 
flin Company, 1919. 

Terman, L. M, and others, The Stanford Revision and Extension of the Binet- 
Simon Scale for Measuring Intelligence. Baltimore, Maryland: Warwick 
and York, 1917. 


CHAPTER IX 
TESTS IN RELATED EDUCATIONAL FIELDS 


51. Relation of Other Educational Achievement to Industrial Arts. 

1 education cannot be measured ade- 
information procurable only 

lected from other, related fields. 


Achievement in industria 
quately without the supplementary 
through the use of educational tests se 
Just as it is impossible to give a proper interpretation to the results of 
achievement testing in any field of subject-matter without the use of 
such supplementary information as mental tests provide, so it is 
difficult if not impossible to secure a complete evaluation of instruction 
in the industrial arts without the use of definite and accurate measures 
of the many factors which contribute to achievement in these subjects. 

Achievement in the content subjects is limited to a very high de- 
gree by the student's reading ability. The comprehension of the 
precise types of directions, symbols, and instructions given in the in- 
dustrial arts subjects is basic. Certain skills in arithmetic, algebra, 
and the sciences are essential in shop work. Mastery of certain Eng- 
lish usages and mechanics is as essential to acceptable achievement 
in this field as in almost any other. A reasonable skill in freehand 
drawing, lettering, and handwriting is also an important limiting fac- 
tor in industrial arts achievement. In this ehapter a number of the 
more useful educational tests selected from important fields related 


to industrial education are described. 


52. Reading Tests. 

The development of the ability to read is one of the most im- 
portant educational and vocational accomplishments of the school. 
Achievement and school progress depend to a very large degree upon 
reading ability, and the higher up in the grades the pupil progresses 
the more important docs reading ability become. In fact, in the high 
school and college, reading ability is the most important single means 
by which knowledge and information are secured. The recognition of 
Silent reading as а basic study tool has done much to improve the 
quality of initial instruction given in the subject. Tt has also resulted 
in stimulating the development and use of effective remedial instruc- 


tion in this field. 
91 


92 TESTS IN RELATED EDUCATIONAL FIELDS 


The elementary school is expected to develop the general reading 
skills, including an effective comprehension of the content of material 
read at an economical and efficient rate. Unduly slow reading is a 
handicap, just as is poor comprehension. However, the elementary 
school deals with reading problems involving comparatively common 
usages and simple vocabularies. It does not concern itself especially 
with technical terms and symbols used in the special subjects, such 
as the industrial education courses. Thus it becomes largely the re- 
sponsibility of industrial education teachers to train their pupils to 
read the technical phases of their specific subjects. One of the most 
effective ways to accomplish this is for the teacher to determine at 
once the general reading ability of his class. In any event, it is quite 
certain that the pupil must have a general reading ability before he 
can acquire the technieal reading ability required in many of the 
specialized industrial education courses. 

A number of excellent general tests of silent reading ability are 
available, but thus far no one has developed a particularly outstand- 
ing device for the measurement of the pupil’s ability to read the 
technical content of specialized courses, Special reading tests de- 
signed to parallel the teaching in the industrial education subjects are 
badly needed. Reading tests using technical content selected from 
the fields of woodworking, drawing, sheet-metal working, auto me- 
chanics, electricity, printing, etc., would be of great value. 

The two reading tests described in this chapter are primarily 
tests of reading ability as it exists at the secondary-school level. 
The tests have been selected as being best suited to meet the needs 
of teachers and supervisors of industrial education courses in securing 
objective and detailed anelytical information concerning the different 
aspects of the reading ability of their pupils. The results from the 
use of the tests and the general specifications upon which they are 
built should prove helpful to industrial arts teachers in constructing 


and using technical reading tests for the various industrial education 
subjects. 


1. HacaERTY READING Examination, 
Forms A AND В! 


Siema 3 


This distinctly valuable test is designed to measure general silent 
reading ability from the fifth grade through the twelfth. The total 
score on the test indicates a measure of 


t general reading comprehension. 
The test is composed of three subtests o 


f vocabulary, sentence reading, 


1 Haggerty, M. E. and Laura C. Reading Examination, Sigma 8, World 
Book Company, Yonkers, New York, 1920. 


READING TESTS 93 


and paragraph reading. The vocabulary content of the test was vali- 
dated by selecting the common words used in seventh- and eighth- 
grade readers and history texts. The actual working time is 28 
minutes, but the administration of the test requires about 45 minutes. 


2. Iowa SILENT READING Test; ADVANCED 
Forms A anp B (REVISED) * 


. 'This test is designed to secure an analytical measurement of the 
silent reading skills used in reading of the work-study type. By the 
use of a series of tests sampling into several different types of reading 
skills the total comprehension score is intended to reveal general read- 
ing ability. The scores on the separate test parts afford the basis for 
the analysis of the strengths and weaknesses of individual students. 
The several parts of the test cover the following unit skills which con- 
tribute to the student’s ability to read and to work with books: 


Test 1. Paragraph meaning. 
Science material. 
Literary material. 

Test 2. Word meaning. 
Social science vocabulary. 
Science vocabulary. 
Mathematies vocabulary. 
English vocabulary. 

Test 3. Paragraph organization. 
Selection of central idea. 
Outlining. 

Test 4. Sentence meaning. 

Test 5. Location of information. 
Use of the index. 
Selection of key words. 

Test 6. Rate of silent reading. 


The total working time for this test in its recently revised form 
is 35 minutes. This makes it easy to administer the test within a 
class period. In spite of this relatively short working time the reli- 
ability of the test is high. The reliability coefficients and the P.E.’s of 
scores reported in Table 25 indicate that scores on these tests may be 
taken as very accurate measures of the silent reading abilities of high- 


school or college students. 
A. N., and Kelley, V. H., Iowa Silent Reading 


? Greene, Н. A., Jorgensen, 
Test, Advanced. Revised Edition for High Schools and Colleges. World Book 
Company, Yonkers, New York, 1931. 


94 TESTS IN RELATED EDUCATIONAL FIELDS 


TABLE 25 


RELIABILITY or Iowa SILENT READING Test, ADVANCED * 


Lowest Highest 
Test т P.E. Score Grade r P.E. Score Grade 
1 St 65 9 91 62 13 
2 En 1.68 9 94 1.08 11 
Б] 86 53 11 94 64 12 
4 59 94 9 95 65 10 
5 SO 91 9 90 63 13 
Total Compr. 95 5.73 12 96 4.98 9 


* Quoted from table in examiner's manual. 


53. English Tests. 

The social importance of the use of correct language habits is so 
great that no teacher can afford to relax for a moment in his demands 
for correctness in the oral and written language of his students. 
Teachers of industrial education must share this responsibility in spite 
of the facts that very often this is a field outside their specific realm 
of interest, and that the written language used in their courses is 
usually quite limited. The demands made on the oral language skills 
are usually as extensive, however, as they are in other subjects. The 
professionally minded teacher of the industrial education subjects will 
be just as careful to watch and correct the language habits of his 
pupils as he is to watch his own language usages. Correct, habits of 
speech and writing come only through extensive and continuous prac- 
tice in correct usage. The industrial arts teacher must.be in a posi- 
tion to cooperate at all times with the English teachers, whose major 
responsibility it is to see that these correet habits function in school 
and the situations of daily life. 

Achievement in English expresses itself in a number of different 
ways, most of which are measurable with tests of some type. The 
most commonly measured phases of English are in the fields of word 
usage, grammar, and the mechanics of written composition. General 
merit of the total written language production is also measurable with 
the help of certain quality scales. Thus far, oral composition has 
evaded most attempts to objectify its measurement. A few tests and 
scales selected from the large number available in this subject are 
described and evaluated here from the standpoint of their use to the 
teacher of the industrial arts subjects. 


ENGLISH TESTS 95 


Measurement of Composition. English composition is ordinarily 
measured in one of two ways. One method concerns itself with the 
more or less objective evaluation of the general qualities of merit which 
the produetion possesses. This procedure makes use of a scale com- 
posed of written productions of different degrees of merit which have 
been sealed or evaluated and arranged in ascending order of merit 
in accordance with a numerical seale. Each different specimen is 
assigned a numerical value in terms of the relative merit it possesses. 
In general, the lower the merit of the specimen the lower the quality 
Scores assigned to it. In actual use the scale is not taken into the 
classroom, but is used by the teacher as a means of assigning general 
merit ratings to the written productions prepared by students under 
rather carefully controlled conditions, and on selected topies and sub- 
ject-matter. The other method of measuring composition is by check- 
ing its form and freedom from mechanieal errors. One of the very 
useful scales for the measurement of English composition, the Willing 


Composition Scale? makes use of both these procedures. 
being rated for story value or general 


The compositions, after : 
merit, are rated for form value by making a careful check of all spell- 


ing, punctuation, capitalization, grammatical, and usage errors. The 
total number of such errors is then divided by the number of words 
comprising the composition. This result is then multiplied by 100, 
which expresses these form errors in terms of the number of such 
errors per 100 words of composition. The number of form errors per 
100 words declines as the quality or story value of the composition 


rises. 


The Willing Scale is the only one of the commonly used scales 


which makes any attempt to combine the ratings for form and quality. 
The Thorndike Extension of the Hillegas Scale, the Hudelson Com- 
position Scales, the Nassau County Extension (Trabue) of the Hille- 
gas Scale are all very useful general merit scales but confine their 
measurement entirely to composition merit. It seems quite likely 
that most industrial arts teachers interested in making any intensive 
cheek on the merit of the written work of their students will find the 
form values and the story values resulting from the use of the Willing 


Scale the most useful measures. 


Measurement of Grammar. -~ 1 
grammar tests are described in this section. 


Two somewhat contrasting types of 


5 Willing, Matthew H., The Willing Composition. Scale, Public School Pub- 
lishing Company, Bloomington, Illinois, 1918. 


96 TESTS IN RELATED EDUCATIONAL FIELDS 


The Iowa Grammar Information Test * is resigned to meet the 
need for a test of the purely informational aspects of English gram- 
mar. In addition to its survey use in English classes, it should prove 
to be a valuable measure of the formal background of grammar needed 
by students who are beginning the study of a foreign language. By 
means of 80 objective exercises of the three-answer multiple-choice 
type it samples into almost all the commonly taught phases of Eng- 
lish grammar. Two equal and parallel forms are available. Percen- 
tile norms are based on 1557 cases in Grades VII to XII. 

The Kirby Grammar Test ° is intended to be used in the measurc- 
ment of usage and grammatical errors in Grades VII to XII. The 
pupil is tested on his knowledge of verbs, pronouns, and certain mis- 
cellaneous usages. For convenience in administration, the exercises 
are arranged in five divisions each containing usage exercises of the 
alternate-response type. The pupil is required to select the correct 
form for а given exercise and then to indicate (by recognition) the 
grammatical rule which governs its usc. 

The reliability of the score on the principles test is about .90, 
but on the sentence test is only around .60. Norms are given for 
Grades VII to XII, but there is not a great difference between the 
norms for the different grades. This seems to indicate that pupils 
do not improve much in grammar during their secondary-school work. 
The actual working time of the test is about 35 minutes. 

Language Usage. 'The language-usage tests described in this sec- 
tion illustrate two different types of measurement in language. The 
first is an analytical test sampling many different language abilities. 
"The second is a general survey of language usage based on the recogni- 
tion of error. 

The Iowa Elementary Language Tests? are designed for survey 
purposes in Grades IV to IX inclusive. However, the reliability of 
measurement on the different parts permits a very useful type of 
analysis of language limitations. The eleven phases of language 
ability sampled by this test range over a total of 304 different items 
with a total possible score of 338 points. In Test 1 which deals with 
two phases of word meaning the four-answer multiple-choice type of 


* Стат, Fred D., and Greene, H. A., The Iowa Grammar Information Test, 
Bureau of Educational Research and Service, Extension Division, University of 
Towa, Iowa City, 1935. 

5 Kirby, Thomas J., The Kirby Grammar Tests, Bureau of Educational Re- 
search and Service, Extension Division, University of Iowa, Iowa City, Iowa, 1920. 

5 Greene, Н. A., Ballenger, Н. L., Тоша Elementary Language Tests, Educa- 
tional Test Bureau, Minneapolis-Philadelphia, 1929. 


VOCABULARY TESTS 97 


exercise is used. Alternate-response exercises are used in two of the 
three tests measuring phases of language usage. The recognition- 
correction type is used in Tests 2-B, 6-A, and 6-B. A novel type of 
technique utilizing keyed brackets is used in Test 5. 

The Wilson Language Error Test* is available in two parts, each 
consisting of three forms. The forms consist of short stories of about 
300 words which contain a number of common language errors. The 
pupil is to read the story and correct the language errors. The test 
is simple to administer and, when at least three forms are used, has 
valuable diagnostic power. The errors included in the tests are those 
commonly made as indicated by studies of pupils’ errors in several 
different schools. The reliability of the test is about .80. The test 
should prove valuable to industrial education teachers in diagnosing 
Norms are given for Grades VII to XII but 


common language errors. 
evels of achievement in all the 


they show approximately the same l 
grades. 


54. Vocabulary Tests. 


Several good general vocabulary tests have been developed. These 


are of some value to teachers of industrial education but they are 
included here more for the suggestions they give concerning methods 
that may be employed in developing suitable vocabulary or word 
meaning tests to parallel the different industrial education courses. 
Hunter’s з W-4 Trade Names Test in Woodwork is one of the pioneer 
efforts of industrial education teachers to develop tests along the line 
of technical vocabulary. The inability of a pupil to understand the 
meaning of words used in a given course does not necessarily mean 
that he should not take the course, but indicates the need for special 
instruction and drill in word meaning early in the course so that the 
pupil can better profit from the instruction. . | 

The Pressey Technical Vocabularies of the Public School Subjects ° 
should be most suggestive to industrial education of possible methods 
of developing tests upon the technical vocabulary used in the indus- 
trial education subjects. This vocabulary list, which includes technical 
vocabularies for fifteen school subjects, contains a list of technical 
words pertaining to woodwork and elementary metal work, but it is 
not entirely adequate for industrial education purposes because it 


covers only these two courses. 
* Wilson, G. M., Wilson Language Error Tests, World Book Company, Yon- 


kers, New York, 1923. n к 
8 hop Tests, The Manual Arts Press, Peoria, Illinois. 
Hunter, W. L., Shop 1 Vocabularies of the Public School Subjects, 


9 Pressey, Luella C., Technica е redi 
Publie School Publishing Company, Bloomington, Illinois, 1923. 


98 TESTS IN RELATED EDUCATIONAL FIELDS 


The procedure in selecting and rating the technical vocabularies 
was as follows. First, all unusual or technical words which appeared 
in commonly used textbooks of the subjects treated were tabulated 
and classified; second, the terms were rated according to importance 
by a group of special teachers of the subjects; and third, terms were 
classified as essential that were checked by more than half of the 
teachers. This is in no sense a test but is a suggestive vocabulary 
study for further development in the industrial education field. 


55. Spelling. 


Industrial education teachers have a distinct responsibility to teach 
the pupils in their classes to spell the technical words peculiar to their 
courses and to aid the other teachers in the school in maintaining 
proper spelling levels in written work. To equip the child with a 
method of learning to spell and to teach the spelling of commonly used 
words is the specific function of the elementary school, but it takes 
continual cooperation by teachers of all subjects to assure the lasting 
assimilation and mastery of these fundamental skills. Teachers of 
industrial education should recognize this responsibility. 

The majority of available spelling tests and scales have been de- 
veloped for the elementary school. However, at least two such spell- 
ing scales are definitely designed for use at the secondary-school level. 
These two scales which are briefly described here should prove of 
very definite value to industrial education teachers in measuring gen- 
eral spelling ability on the junior- and senior-high-school levels. They 
should also prove suggestive for the construction of spelling scales 
dealing with the technical words which are an integral part of instruc- 
tion in industrial education. 


1. SIXTEEN SPELLING SCALES STANDARDIZED IN SENTENCES FOR 
SECONDARY SCHOOLS 1° 


These scales, frequently called the Seven-S Scales, consist of 16 
separate and sealed lists of 20 words e 


utes to give any one of the scales 
visable to use two and combine th 


ach. It requires about 5 min- 
, and for individual scores it is ad- 
€ scores. The tests have been care- 
fully prepared and afford a very satisfactory means for industrial 
education teachers to measure the general spelling ability of their stu- 
dents. The scales do not measure the ability to spell the related tech- 
nical words in industrial education. 


10 Hudelson, Earl, Stetson, F. L, and Woodyard, Ella, Sixteen Spelling 
Scales Standardized in Sentences for Secondary Schools, Bureau of Publications, 
Teachers College, Columbia University, New York City, 1920. 


WRITING i 99 


The validity of the scales is based on the second and third thousand 
most common words as found in a composite list based on four sep- 
arate vocabulary studies. The words in the scales are so selected 
and arranged that each word is one-tenth of a sigma unit more diffi- 
cult than the preceding word. In administering the tests the words are 
given in sentences, but the pupils are required to write only the one 
word that is to be tested. The reliability is reported as being high. 
Norms are provided for Grades VII to XII. 


2. SIMMONS-BIXLER STANDARD HIGH-SCHOOL SPELLING SCALE 
Forms I, II, III, IV = 


This unusually valuable spelling seale for high-school use is based 
upon an extensive program of investigation in high-school spelling 
undertaken by Mr. Simmons, and supplemented by a revision of the 
original material under the direction of Dr. Bixler. The result is a 
series of four forms of scales each containing a preliminary spelling 
test of 100 words, and 64 scaled lessons of 40 words each. The source 
of the vocabulary is the socially significant list of words comprising 
Horn's Basic Writing Vocabulary; 10,000 Words Most Commonly 
Used in Writing 1° after the elimination of a number of abbreviations, 
irregular forms, and a group of words spelled correctly by 90 per cent 
or more of high-school freshmen. The scale location of each word is 
based upon 200 spellings per grade. 

In addition to the scaled tests, an alphabetical list of 2910 words is 
presented with the percentile placement of each word for students in 
Grades IX to XII. Such a scale constitutes a valuable source of in- 
structional material for use in conducting the spelling “hospital” as 


well as a useful source of test material of known difficulty. 


56. Writing. 


The development of the initial skills in writing is one of the fune- 


tions of the elementary school, but if students are to be legible writers 
after the formal education period they must be checked continuously 
by the teachers in all subjects. Although it is desirable that the basic 
writing habits become more or less automatic, it is also desirable that 
conscious writing be perfected to such a degree that it will still be 
legible and of good quality when it does become automatic. Indus- 
trial education teachers can aid in the development of good writing 
11 Simmons, Ernest P., and Bixler, E vC MM High School Spelling 
Á ^ ‚‚ Atlanta, М 
Set, Tumen Ё. Smith and o atlary, Sinte University o To Mono 
graphs in монон, Series 1, No. 4, University of Iowa, Iowa City, 1926. 


100 TESTS IN RELATED EDUCATIONAL FIELDS 


habits by demanding legible writing and printing from the students 
and by displaying high-quality specimens of such work on the bulletin 
board or on wall charts. 

Writing has two characteristics which are important in rating, 
namely, quality and speed. Quality is usually determined by having 
samples of handwriting rated by qualified judges and the samples 
placed in order of merit on a linear scale. Samples of pupils’ writing 
obtained under standard conditions may then be compared and rated 
according to the value of the specimen on the scale which it most 
nearly resembles in quality. Speed is determined by counting the 
number of letters of standard copy written in one minute. Eighty 
letters per minute is considered a satisfactory speed for pupils in the 
ninth grade. 

Speed and quality are not rated together. If a pupil of average 
writing ability writes slowly and carefully the quality of his writing 
may improve. If he writes very rapidly there is likely to be a reduc- 
tion in quality. Both speed and quality can be improved through 
practice. If a pupil is to reach a maximum speed and quality he must 
also have a good technique (proper position at desk, hold pencil or 
pen and paper in correct position, ete.) . 

The rating of handwriting is valuable to industrial education teach- 
ers from another angle since it is similar to the rating of quality of 
workmanship on industrial education projects. It is almost identical 
with the rating of lettering in drawing, and it has many factors in 
common with the rating of soldering, riveting, boring, and splicing 
wire. It is also well to note at this point that speed and quality are 
measured as separate items. This is also true of speed and quality 
in rating the results of manual operations in industrial education. 

Two handwriting scales which should prove valuable to teachers 
in rating quality and in diagnosing faults in handwriting have been 
selected for description. A copy of these scales might well be posted 
in the shop and used by students to check and analyze samples of 
their handwriting and as constant reminders to improve their own 
writing. 

1. Ayres HANDWRITING SCALE 13 


The Ayres Handwriting Scale now in most common use is known 
as the “Gettysburg Edition” because the samples in the scales are 
based upon copy from the first four sentences of Lincoln’s “Gettysburg 
Address." The scale consists of nine widely varying specimens of 
handwriting graduated by tens from twenty to ninety. Each section 

13 Russell Sage Foundation, New York, New York, 1912. 


MEASUREMENT OF MATHEMATICS 101 


on the scale is represented by a twelve-line section from the “Gettys- 
burg Address." The relative merit of the specimens was determined 
by the differences in the lengths of time required by trained judges 
to read each sample. Thus legibility becomes the criterion of merit. 
This procedure is distinctly in contrast with that used by Thorndike ** 
in the development of his Handwriting Scale. The results of the use 
of the two types of scales in the classroom are quite similar, however, 
in spite of the differences in their construction. Available standards 
writing scales are established only for the ele- 
and accordingly are of little value above the 
r, it may be useful to point out that the writing 
-school pupils should be quality 60 or above 
approximately 80 letters per minute. 


for the various hand 
mentary-school grades 
eighth grade. Howeve 
of junior- or senior-high 
on the Ayres Scale at a speed of 


2. Freeman’s DIAGNOSTIC HANDWRITING ScarE 15° 

ement of handwriting would be complete 
without at least a brief mention of the Freeman Chart for the Diag- 
nosis of Handwriting Faults. By the use of this analytical chart, 
attention may be focused upon such qualities as uniformity of slant, 
uniformity of alignment, quality of line, letter formation, and letter 
and word spacing. Slant of letters may be revealed by drawing lines 
through the letter indicating their slant. If the lines are not parallel 
the lack of uniformity in letter slant is revealed. Alignment may be 
shown by drawing lines parallel with the bottom and tops of the 
smaller letters. Weaknesses in letter formation are more difficult to 
reveal and to classify since there are $0 many different types. Im- 
properly closed a's and оз and badly formed n’s and ws are common 
types of letter-formation difficulties. Too crowded as well as too 
widely spaced letters and words operate to reduce the quality of 
writing. The critical and ambitious teacher of industrial arts sub- 
jects will find many opportunities to use this effective analytical scale 
in bringing about distinct improvements in the handwriting of his 


Students. 


No diseussion of measur 


57. M of Mathematics. | 
easaremeni dustrial arts subjects, frequent demands 


vork in in 
Throughout the wort! ] skills. In general, these skills 


are made on certain basic mathematica 1 ed 
аге presented for initial learning in the courses in arithmetic, algebra, 


and plane geometry. 


м Thorndike, Edward L., 
1-93, March, 1910. 

15 Freeman, F. N. Freeman C 
Houghton Mifflin Company, Boston, 


“Handwriting,” Teachers College Record, Vol. 11: 


hart for Diagnosing Faults in Handwriting, 
1914. 


102 TESTS IN RELATED EDUCATIONAL FIELDS 


Arithmetic. Although arithmetic is rightly considered an elemen- 
tary-school subject, it is an important factor in achievement in many 
secondary-school subjects. Arithmetical skills are in demand in prac- 
tically all classroom and shop activities in the industrial arts. Ac- 
curacy in making calculations in connection with shop work and other 
industrial subjects is an important factor in such achievement. Among 
the arithmetic tests which are most likely to be of use to the teacher of 
industrial education are such tests as the Compass Survcy Tests, Ad- 
vanced Examination; for Grades IV to VIII, the New Stanford 
Achievement Arithmetic Test * for Grades IV to IX, the Otis Reason- 
ing Tests in Aritlumctic?? for Grades IV to IX, and possibly certain 
selected narrow function tests in arithmetic such as the Compass Diag- 
nostic Tests. 

High School Mathematics. Algebra and plane geometry represent 
the phases of high-school mathematies of most interest to the teacher 
of the industrial arts subjects. Among the first-year algebra tests 
which may readily be of use to the shop teacher is the Columbia Re- 
search Bureau Algebra Test." In its present form this test is in two 
parts. Part I is designed to cover the algebra commonly taught in the 
first semester of the course. Part II covers the second semester’s work. 
A much more intensive type of measurement is provided by the Jowa 
Unit-Achievement Tests in Algebra? These tests are in two equal 
forms each made up of six tests covering the entire year’s work in 
first-year algebra. The standards represent achievement as of the 
time when the original instruction on the material was completed. 

Achievement in plane geometry may be effectively surveyed by 
such end-of-the-year tests as the Schorling-Sanford Plane Geometry 
Tests ^ or the Columbia Research Burcau Plane Geometry Tests.” 


16 Greene, Н. A, Knight, F. В., Ruch, G. M. and Studebaker, J. W., The 
Compass Survey Tests, Scott, Foresman and Company, Chicago, 1927. 

17 Ruch, G. M., Terman, L. M. and Kelley, T. L, The New Stanford 
Achievement. Tests, World Book Company, Yonkers, New York. 

18 Otis, Arthur S., Olis Reasoning Tests in Arithmelic, World Book Company, 
Yonkers, New York, 1923. 

19 Otis, Arthur S., and Wood, Ben D., Columbia Research Bureau Algebra 
Test, World Book Company, Yonkers, New York, 1927. 

= Greene, H. A, and Piper, A. H., The Iowa Unit-Achievement Tests in 
First-Year Algebra, Bureau of Educational Research and Service, Extension Di- 
vision, University of Iowa, Iowa City, 1931. 

21 Schorling, Raleigh, and Sanford, Vera, The Schorling-Sanford Plane Geom- 
etry Tests, Teachers College Bureau of Publications, Columbia University, New 
York City, 1925. 

?? Hawkes, Herbert E., and Wood, Ben D., Columbia Research Bureau Plane 
Geometry Test, World Book Company, Yonkers, New York, 1926. 


MEASUREMENT IN SCIENCES RELATED TO INDUSTRIAL ARTS 108 


For periodical measurement of achievement over relatively small sec- 
tions of plane geometry instruction tests such as the Lane-Greene 
Unit-Achicvement Tests in Plane Geometry * may be used. 

In addition to the tests in arithmetic, algebra, and plane geometry 
which have been described in this chapter there is a real need for tests 
of related mathematics to parallel the several industrial education 
courses in electricity, auto mechanics, sheet-metal working, drawing, 
and printing. Something similar to the type of inventory measure- 
ment secured by the Kilzer-Kirby Inventory Test for the Mathematics 
of High-School Physics ** is greatly needed in these fields. Hunter? 
has recognized this need and has done some pioneer work by devel- 
oping short tests in shop mathematies, shop arithmetie, and geometry. 
These tests are not standardized or long enough to be highly reliable, 
but they are of some value for measuring the mathematies related to 
industrial arts and for the suggestions they offer for further develop- 


ment along similar lines. 


58. Measurement in Sciences Related to Industrial Arts. 

Certain contributions of the high-school sciences are apparent in 
many of the industrial arts courses. Accordingly complete measure- 
ment of achievement in these courses must at Jeast include some atten- 
tion to the fields of high-school physics, chemistry, and general science. 
Such survey tests as the Columbia Rescarch Bureau Physics Test 2% 


will be found to be very effective measures of end-of-the-year achieve- 


ment in physics. In a similar way the Columbia Research Bureau 


Chemistry Test will be an effective survey instrument for use by the 
industrial arts teacher. General science covers so many different 
phases of the sciences that without doubt it is one of the most, useful 
fields to survey in any attempt to discover the range of information in 
the sciences held by industrial arts students. For this purpose one of 
the most useful tests is the Ruch-Popenoe General Science Test." 
Standards are provided for one-semester and year courses in this 


subject. 


23 Lane, Ruth, and Greene, eaten, Mise 
in P. ‚отет inn and Company, Boston, ass. Р 
ш oon irs aa Kirby, T. J» Inventory Test for the Mathematics of 


High-School Physics Public School Publishing Company, Bloomington, Illinois, 
xs 4 

1929. 
25 Hunter, W. L., Shop Tests, 


Illinois, 1927. 
26 Farwell, Н. W. and Wood, Ben 


E А а Yonkers, 1926. E. 
т, Мн за н р н. F., Ruch-Popenoe General Science Test, 


World Book Company, Yonkers, New York, 1923. 


H. А., The Lane-Greene Unit-Achievement Tests 


Series No. 2, The Manual Arts Press, Peoria, 


D, Columbia Research Bureau Physics 


104 TESTS IN RELATED EDUCATIONAL FIELDS 


SUMMARY 


Achievement in industrial education cannot be completely and ef- 
fectively measured without the use of supplementary educational tests 
selected from other related fields. Since the language skills as repre- 
sented by reading, language usage, grammar, spelling, handwriting, 
vocabulary, and composition abilities are so basic and so fundamental 
to achievement. in industrial education subjects, considerable atten- 
tion is given to the discussion of tests in these fields. The social im- 
portance of the correct use of these language skills is so great that no 
teacher can afford to relax for one moment in his demands for correct- 
ness in the oral and written language habits of his students. 

Demands are made on certain of the high-school science courses 
by many of the units of work in industrial education courses. Ac- 
curacy and speed in making certain mathematical calculations in con- 
nection with shop work are also desirable accomplishments. Accord- 
ingly, the teacher of industrial education will wish to sample somewhat 
liberally the abilities in these other related fields ‘of educational 
achievement. 


SUMMARY EXERCISES FOR DISCUSSION 


1. What educational fields appear to be most closely related to achievement 
in industrial arts? 

2, In what specific ways does achievement in industrial arts and other high- 
school subjects appear to be related to the ability to read rapidly and 
well? 

3. Catalogue the major language skills which should receive the attention of 
the teacher in industrial arts subjects? 

4. In your judgment, what is the most: acceptable basis for the selection of a 
high-school spelling vocabulary? 

5. What is the responsibility of the teacher of industrial arts subjects for 
satisfactory mastery of spelling and handwriting on the part of his stu- 
dents? 

6. Secure a copy of the Ayres Handwriting Scale and rate at least a dozen 
samples of handwriting representing a wide range of quality. After two 
or three days rate the samples again without reference to the scores 
previously assigned. On what percentage of samples did your two sets 
of marks agree within five points on the scale? 

7. List a few of the more important arithmetical skills which appear to persist 
into the high school. 

8. Why are there no adequate diagnostic tests in algebra or geometry? 

9. What special procedures can you suggest for improving problem solving 
either in arithmetic, algebra, or in the sciences? 

10. Compare two selected algebra or 


geometry tests showing complete lists of 
specific skills measured by each. 


SELECTED REFERENCES 105 


SELECTED REFERENCES 


GREENE, Н. A, JORGENSEN, A. N., Kerley, V. H., Examiner's manual for Тоша 
Silent Reading Test, Advanced. Revised Edition. Yonkers, New York: 
World Book Company, 1931. 

Greene, Н. А., and Pirer, A. H., Examiner's manual for the Jowa Unit-Achieve- 
ment Tests in First-Year Algebra. Iowa City: Bureau of Educational Re- 
search and Service, University of Iowa, 1931. 

Haccerry, M. E, and Lavra C. Manual for Administering and Interpreting 
Silent Reading Examination, Sigma 3. Yonkers, New York: World Book 


Company, 1920. 
Horn, Ernest, A Basie Writing Vocabulary. Iowa City, Iowa: Department of 


Publications, University of Iowa, 1926. 

HupELsox, Baru; Srerson, F. L.; Woopyarp, ELLA, Sixteen Spelling Scales Stand- 
ardized in Sentences for Secondary Schools. New York: Bureau of Publica- 
tions, Teachers College, Columbia University, 1920. 

Hunter, W. L., Shop Tests. Peoria, Illinois: The Manual Arts Press, 1927. 

Кїшвү, T. J, The Kirby Grammar Tests. Iowa City, Iowa: Bureau of Educa- 
tional Research and Service, University of Iowa, 1920. 

Laws, Витн E. and Greens, H. A. Examiners manual for Lane-Greene Unit- 
Achievement Tests in Plane Geometry. Boston: Ginn and Company, 1929. 

Pressey, LUELLA C, Technical Vocabularies of the Public School Subjects. 
Bloomington, Illinois: Public School Publishing Company, 1923. 

Simmons, E. P. and Brxter, H. Н., A Standard High School Spelling Scale. 
Atlanta: Turner E. Smith and Company, 1928. 

Suri, H. L., and Wricut, W. W., Tests and Measurements. New York: Silver, 


Burdett and Company, 1928. А MM à 
Wuuine, M. H., The Willing Composition Scale. Bloomington, Illinois: Public 


School Publishing Company, 1918. А 
Witson, С. M. Examiner's manual for Wilson Language Error Test. Yonkers, 


New York: World Book Company, 1923. 
Wiacs, ам, and Hoke, К. J., How to Measure (Revised). New York: The 


Maemillan Company, 1928. 


CHAPTER X 
TESTING TECHNIQUES IN INDUSTRIAL EDUCATION 


1. TYPES OF OBJECTIVE TEST EXERCISES 


Certain types of objective exercises which are useful in construct- 
ing tests in industrial education are discussed and evaluated in this 
chapter. Many different forms of test exercises have been used suc- 
cessfully in other subjects to objectify pupils' responses. Doubtless, 
new or modified types will be developed to meet future testing needs. 
Industrial education teachers should become critical of such pro- 
cedures, and should learn to select and use the types of test exercises 
best adapted to their instructional materials. 


59. Objective Techniques Adapted to Testing in Industrial Education. 


In the measurement of technical knowledge the usual types of ob- 
jective exercises can be used. The measurement of manipulative abil- 
ity requires types of objective exercises designed specifically for the 
purpose. Two general types of test exercises are used in measuring 
information and manipulative ability, namely, (1) the recall type and 
(2) the identification type. In a recall exercise the pupil is ealled 
upon to supply the answer from memory. In the recognition or iden- 
tification types, the pupil must choose the correct response from sev- 
eral possibilities. The latter type involves the recalling of character- 
istics and relationships but does not call upon memory for the major 
items of the exercise. 

It would be hopeless to attempt to illustrate all the possible forms 
of objective exercises which have been used in testing, but several 
examples are given here which should suggest to the industrial educa- 
tion teacher in his construction of tests ways of mecting his measure- 
ment needs in the classroom and shop. The different objective types 
should be studied carefully so that the teacher ean readily select those 
best adapted to measuring different phases of information and manip- 
ulative ability. 

106 


RECALL EXERCISES 107 


60. Classification of Objective Test Exercises. 

Objective test exercises of types most likely to be of use in the in- 
о education classroom and shop fall into the following classifica- 
ions: | 

I. Recall. 


A. Simple recall. 
B. Completion exercises with one or more key words omitted 


C. Completion exercises with answers suggested or con- 
trolled. 


II. Recognition. 
A. Multiple-response tests. 
1. One correct response. 
2. Multiple-answer exercise with varying degrees of 
merit. 
3. Multiple answers with one or more correct answers. 
B. True-false exercises. 
1. Yes-no questions. 
2. True-false statements. 
3. Diagram and true-false. 
4. Double true-false statements. 
C, Matching exercises. 
1. Word matching. 
2. Picture matching. 
3. Unbalanced column. 
D. Rearrangement test exercises. 
1. Order of operations. 
2. Classification. 


IIL Performance. 
A. Quality or accuracy. 
B. Identification of tools and m 
1. Simple recognition. 
2. Recognition and analysis. 


C. Technique. 
D. Speed or rate of response. 


naterials. 


61. Recall Exercises. 
. The varied forms of the 
in test construction in all fields. 


eral instructional fields Conneau 
1 Ruch, G. М., The Objective or New-Type Examination, Scott, Foresman and 


Company, New York, Chapter VIII, p. 189. 


recall-test exercises have been widely used 
In an analysis of 375 tests from sev- 
1 found that nearly 30 per cent of all 


108 TESTING TECHNIQUES IN INDUSTRIAL EDUCATION 


the test exercises were of the completion type. The recall exercises 
are of real value in testing information in industrial education, but of 
less value in measuring manipulative activity. 

The recall type of test exercise has gained favor as a device for 
the measurement of information in all fields because it is almost en- 
tirely objective when properly constructed and administered. Guess- 
ing and chance factors operate very slightly. The recall question is a 
natural form of questioning and is easily and rapidly scored. An 
important limitation of the recall type of test item is that it tends to 
be merely factual in character. The recall exercise also requires a 
great deal of care in preparation, for the reason that unless the missing 
clue words are carefully chosen several answers will be possible, which 
will make the scoring difficult at times and bring in the subjective 
judgment of the teacher. In constructing a completion test it has been 
found advisable to have each blank call for a single idea, and to avoid 
a large number of blanks by omitting only a few key words. 


SAMPLE RECALL Exercises 
Simple Recall 


1. Directions: Answer each of the following questions with a single word. 
Write the word on the line after the last word of the question. 


4. What liquid is commonly used to thin cabinet varnish? 


2. Directions: After each finishing material write the proper thinner. 
1. Shellac 
2. Varnish 
3. Paint 
4. Lacquer 
5. Enamel s 
. . 6. Kalsomine å 


‘ 
Completion Exercises aes y * ^ 


3. Directions: The following statements аге t 


о be completed by adding one 
and only one, word in each blank. Lu! Lidl: і 


+ B 
1. Oak is a good cabinet ....... С 
2. The length of a meter is ...... feet ...... inches, 
3. Cabinet glue should not be heated above ...... degrees Fahrenheit. — 
4. Wood should not be ...... across the grain. А 
5. The surface of a cabinet wood is prepared for finishing by ...... T. Es. 


fS as BHO „м 


MULTIPLE-RESPONSE EXERCISES 109 


Completion Exercises with Answers Suggested 


4. Directions: Complete the following sentences by inserting one of the words 
found in the list on the right of the page. The words are to be used only once. 


1. A cabinet scraper is used to ...... a surface. 1. Squareness 

2. A try-square is used to test for .. " 2. Smooth 

‚ЖЕ osos is used for cutting lengthwise of the grain. 3. Mill 

4. A ...... file is used to shape the edge of a cabinet 4. Ripsaw 
scraper. 


62. Multiple-Response Exercises. 

The multiple-response test is one of the most satisfactory objective 
test exercises to use in the measurement of information and reasoning. 
On the average it is somewhat-more reliable than the true-false type, 
but is probably not so reliable as the recall test when the tests are 
equated.for length in terms of the number of items in each. It is 


fairly easy to score, but not easy to construct. 
Guessing is a factor which must be taken into account in every ob- 


jective test form in which the single correct answer must be selected 
from two or more suggested responses. In theory, at least, guessing is 


reduced in multiple-response items by increasing the number of sug- 


There is a practical limit to this, however, for it 


gested responses. : 
soon becomes apparent that it is impossible to select large numbers 
for an item. If an exercise were 


of equally plausible wrong responses 4 
made up with five responses, three of which were so obvious that they 


would be eliminated at once by а pupil with only a minimum of infor- 
mation, the test exercise would be no more effective than it would be 
if it were made as an alternate-response exercise to begin with. As a 
matter of fact, it would be made less effective by the inclusion of the 
useless material. The tendency at the present time seems to be in the 
direction of the three-response type. In any event, it is usually de- 
sirable to prepare the exercises with the same number of responses 
throughout the test, if it is to be corrected for chance by the usual 


formula. The factor of guessing 


detail in Section 71 of this chapter. — T 
In the multiple-choice form of exercise the pupil indicates the cor- 


rect response by underlining or checking the answer, or by placing the 
number of the correct response on à blank at the end of the exercise. 
The writing of the number of the correct response rather than the 
response itself reduces the amount of writing required of the student 


and in general is quite satisfactory RD biestive. 
` JA rs 
L| 


in objective tests is discussed in more - 


110 TESTING TECHNIQUES IN INDUSTRIAL EDUCATION 


SAMPLE MurrIPLE-REsPONSE Exercises—Onr CORRECT ANSWER 


5. Directions: Each of the following statements can be correctly completed 
by one and only one of the numbered expressions. You are to write the number 
which stands for the correct expression on the line at the right of the exercise. 

1. Lacquer is thinned with . 
(1) turpentine (2) alcohol (3) amyl acetate (4) mineral oil ...... 
2. A shellac brush is cleaned with 
(1) turpentine (2) water (3) aleohol (4) gasoline 
3. No. “00” sandpaper is coarser than 
(1) No. "0" (2) No. “000” (3) No.2 (4) No. 5 
4. Outside paint is thinned with 
(1) water (2) paint remover (3) linseed oil (4) alcohol 


MurriPLE-ANswER Exercises WITH ANSWERS or VARYING DEGREES 
or MERIT 


6. Directions: Underline the one word in the parentheses of cach statement 
which best completes the statement, 
1. (Walnut, bass, pine, balsa) is a favorite cabinet wood. 


2. Mahogany is used for making (barrels, ships, furniture, fence posts). 


3. Cypress is used in making (water tanks, beds, floors, boxes). 
4. Red wood grows in (Indiana, Towa, California, Louisiana), 


MULTIPLE-ANSWER Exercises WITH ONE ов Моке CORRECT ANSWERS 


7. Directions: Underline all the words in each p; 


arenthesis which will make 
true statements, 


1. (Cypress, pine, redwood, elm) "A uscd in making water tanks. 
2. (No. 2, No. “000,” No. 4, No. ^00") “î? fine sandpaper. 
8. (Oak, pine, basswood, ebony) 3° soft wood. 


4. (Maple, walnut, fir, yellow pine) бы good cabinet wood. 


63. True-False Exercises. 


The true-false or “yes-no” form of test exercise is one of the most 
popular types for measuring information. The true-false exercise is 
objective, easy to score, has wide adaptability, permits extensive sam- 
pling in short working periods, and if ingeniously devised may be 
used to measure reasoning as well as memory. However, it is not 
adapted to the measurement of manipulative skills. 

High-quality objective exercises of the true-false type are not 80 
easy to construct as it might at first appear. Only materials which 
are strictly true or false should be put into true-false exercises. Double 
negatives and trick questions have no place in true-false questions. 
They should be stated in simple, direct language. The purpose is to 


TRUE-FALSE EXERCISES 111 


get an objective measure of the pupil's knowledge, and not to confuse 
or bewilder him intentionally. Any true-false items that are likely to 
suggest the correct answer to other items should be widely distributed 
in the test, If reliable results are desired, true-false statements and 
other forms of test exercises should not be dictated to the class. 
Paterson ? reports that dictating test items to a class tends to reduce 
the reliability of the test. If possible, separate copies of the test 
should be prepared for each pupil in the class. 

The chief limitions of true-false test exercises are that they are 
open to the influence of guessing and chance factors, and also that they 
are rather difficult to construct so that the items will be strictly true or 
false without being too obvious. Attempts to make them less obvious 
usually makes them ambiguous. Both these limitations can be over- 


come to a large extent by the thoughtful test worker if he will take 
unusual care in the construetion of the exercises, and correct for guess- 


ing when scoring the test. 

Two types of alternate-response test exercises (true-false; yes-no) 
are recognized—the single and double types. The single true-false 
statement is the more common type and has either a true or a false 
statement for each fact measured in the test; the double true-false 
has both a true and a false statement for each concept in the test, 
both of which must be answered correetly in order for the pupil to 
score on the pair. The paired or double true-false test is a later form 
designed to control the effect of chance somewhat more definitely by 
having foreed the pupil to respond to both a true and a false item on 
each fact or concept. The double true-false test undoubtedly does 


climinate chance to some extent, but the two test items which relate 
istributed in the test that the pupil can- 


to the same point must be so d ; 
not make a direct comparison of them. А test made up of 100 paired 
irue-false items has been shown to be more reliable than 100 items 

but it requires the same amount of 


stated in usual alternate form, \ 
Ше j an ordinary true-false test of 


space and time as would be devoted to i 
200 items. The reliability and the apparent validity of measurement 


resulting from the paired exercise test will be somewhat higher.? How- 
ever, the difficulty of making suitable statements of important or basic 
items in the subject-matter in paired form (in Бош true and false 
form) is very great. Experience im formulating гарой exercises 
soon makes it apparent that certain subject-matter concepts lend 
Type Examinations, World Book Com- 


2 Paterson, D. С, Constructing New- 


pany, А 
3 Greene, Н. A, “A New Correctio! m 
ional Research, 


cises,” Journal of Educate 


n for Chance in Alternate Response Exer- 
: 102-107, February, 1928. 


112 TESTING TECHNIQUES IN INDUSTRIAL EDUCATION 


themselves to statement in one form much more readily than they do 
in the other. In the development of ordinary true-false examinations 
it will be found to be a very good practice to go through the basic 
facts carefully, writing first the false statements which are to go into 
the test, and then follow with a like number of true statements, which 
usually are much more easily stated. 

True-false exercises have been criticized by some writers because 
they were of the opinion that the presentation of false forms had a 
negative effect on learning. Studies by Remmers and Remmoers * and 
by Roberts and Ruch ° have shown that the negative suggestion effects 
are not so significant as might be supposed. The evidence indicates 
that true-false statements are a stimulus to learning and are of real 
value, although this has not been conclusively proved. The burden 
of proof now seems to rest with those who believe that the true-false 
question exerts a negative effect. 


SAMPLES or TRUE-FALSE EXERCISES 
True-False Statements 


8. Directions: Examine each statement below and decide whether it is true 
„or false. If true, underline true; if false, underline false. If an item is too 
hard, skip it and go on to the next one. Do not guess. This test will be cor- 
rected for guessing. 
1. The carburetor maintains the correct proportion of 
fuel and air at all speeds, 


True False 
2. The generator on an automobile steps up the primary 


current into a high tension spark. True False 
3. The transmission makes possible different speeds for- 
ward and reverse. True False 


4. The distributor turns the motor over until it has drawn 


gas in and compressed it. True False 


БЫ 
| = < Diagram and True-False 


9. Directions: Read the following state- 
ments about the drawing (Fig. 6) and 
mark them true or false by referring to 
the drawing. If a statement is true, under- 
line the word true; if false, underline the 
Fic. 6.—Isometrie Drawing of Wort {ше КИРДИ ER This test will 

Radio. be corrected for guessing. 


+ Remmers, H. Н., and Remmers, Е. М., “The N 
of True-False Examination Questions,” 
17: 52-56, 1926. 

5 Roberts, H. M., and Ruch, G. M., “The Negative Suggestion Effect of True- 
False Tests,” Journal of Educational Research, Vol. 18: 112-116, September, 1928. 


egative Suggestion Effect 
Journal of Educational Psychology, Vol- 


MATCHING EXERCISES 118 


True False 
True False 
True False 
True False 
True False 
True False 
True False 


. The radio cabinet is 2' long. 

The top of the cabinet is 75" thick. 

The width of the base is 9". 

The depth of the cabinet is 8%. 

The base of the cabinet is 1” thick. 

The length of the top is not given. 

The front panel of the cabinet is 8" high. 


64. Matching Exercises. 

Matching exercises are valuable in industrial education for meas- 
uring relationships between items of information, or tools and ma- 
terials. The pupil taking the test is called upon to recognize relation- 
ships between a test list and an answer list. The pupil usually writes 
the number of the related item before or after the unnumbered item. 
Matching tests are objective; easily scored, and in certain subject fields 
easy to construet. They can be used in measuring factual materials 
or judgment. In constructing matching exercises, it is important to 
have 10 or more items to reduce the operation of chance, but if very 
long lists are used, 25 or 30 items, considerable time is lost by the pupil 
in sorting the various related pairs. An improvement in this practice 


is to arrange two separate groups of 10 or more items. 


Noo me wp 


pLEs oF MATCHING TESTS 


Word Matching 
10. Directions: Below are tv words which are related in meaning. 
Write the sumet of the words in the left-hand. column on the blanks in the 
right-hand column so that they will show the items which are related. For 
ы You find the word “Май” in the right-hand 


Sam 


уо columns of 


ex ме”, 2» jg number 1. ә к 
ЕЕ on ебе 1 in the blank before the word BAU, 1.2. Май” 
Indicate the selation of all the other items in à similar manner. Do not guess. 
А я joa ap Brace 
1. съш _.. Sheet copper 
. Saw ... -Alcohol 
3. Mortise ....Linseed oil 
4. Open-grain wood ... Sandpaper 
5. Turpentine ...Tenon 
6. Shellac ;...Nail 
7. Wood .Rip 
A o paint „Paste filler 
. Bit .... Varnish 
10. Tin snips „...Thumb tack 
11. Knife ... .Oilstone 
12. Drawing board 
ination, Scott, Foresman and Com- 


NS „Type Exam 
` Ruch (The Objective ud к desirable to have less than 10 complete 


mà дыш mene ыр be made in опе column or the other to aid in 
Д exce: 


caring for the chance element. 


114 TESTING TECHNIQUES IN INDUSTRIAL EDUCATION | 
| 
Parts Identification Test | 


11. Directions: Study the drawing of the automobile engine (Fig. 7). Notice 
that some of the engine parts are numbered. Below the drawing аге the а 
of the engine parts which are numbered in the drawing. Write the num isi 
appearing in the drawing in the blanks opposite the correct name for each part. 


£) 


ااك 


` 


(4) (9 


Fic. 7—Ford V-8 Motor, 


Spark plug Gas pump ..ä 
Exhaust manifold Generator 

Water pump Air cleaner 

Starter "Transmission 

Carburetor Crank shaft 

Breather pipe Cylinder head 

Distributor Fan 

Fan belt 


Engine supports 
65. Rearrangement Exercises. 


The rearrangement test is peculiar] 
tion in industrial education which iny 
of operations, and classification of m 
quality. Tt lends itself very w 
volves the employment of skill 
in a verbal fashion the plan o 


у adapted to testing informa- 
olves chronological order, order 
aterials according to grades ОГ 
ell to measuring information which an 
s in sequence, or as a means of checking 
f a job involving motor skills. 


REARRANGEMENT EXERCISES 115 


d к exercises which involve five to eight relations prob- 
inc = not need correcting for chance. If carefully constructed, re- 
weh exercises are objective and easy to score. They are 
е | ш diagnosing special difficulties as they are revealed by a pupil's 
wa or ability to recognize proper relations in a test situation. 
"- на this form of test exercise is quite difficult to construct 
and requires considerable space, it has been used successfully for meas- 
uring home mechanics. The following examples from the Newkirk- 
Stoddard Home Mechanics Test ° serve to illustrate its operation in 


this field. 


SAMPLES OF REARRANGEMENT EXERCISES 
the following pages are given a number of com- 
i The proper steps for carrying out each job are 
given here, but these steps are not placed in the correct order. Examine each 
job in turn, and decide which step should come first. Place the number of this 
step in the first pair of parentheses, that is, the parentheses at the left. In the 
same way insert the numbers of the remaining steps in the proper order or se- 
d the numbers in the 


quence, so that when you have finished, one can rea 
parentheses from left to right and find out just how to carry out the steps in 


the whole job. 


Samp: Job: То Set Casters. 
(1) Drive caster-sheafs. 
(2) Select a bit and drill the hole. 
(3) Select a suitable caster. 

(4) Insert the caster and test. 
(5) Mark the point for the locatio! 


И 12. Directions: On this and 
non jobs in home mechanics. 


n of the caster. 


Rearrange the numbers 10 show correct procedure: 
(3) (у O ©) O 
13. Directions: For the jobs which follow, the connections called for are to 
be indicated right on the diagrams by drawing in lines with pencil or pen. Read 
the directions for each job very carefully. When you have figured out how the 
wires should go, mark them in neatly an If you don’t know all the 


connections, mark those you think are right. 


Job 4. To Connect Three Dry Cells in Pa 
Directions: Show the correct circuit by dra 


Dry Cells 


d clearly. 


rallel. 
wing lines between the black dots. 


Newkirk-Stoddard Home Mechanics 


eorge D., 
State University of Iowa, Iowa 


7 Newkirk, L. У., and Stoddard, G А 
h and Service, 


ub Bureau of Educational Researe 
ity, Iowa, 1928. 


116 TESTING TECHNIQUES IN INDUSTRIAL EDUCATION 


IL PERFORMANCE TEST EXERCISES 


66. Objective Performance Exercises. 


Objective exercises for the measurement of performànce have not 
been developed to the point of perfection that characterizes many of 
the objective pencil-and-paper tests of information. There is need for 
much additional experimentation to evaluate the most useful types of 
exercises for the measurement of performance. 

Such performance exercises as have been developed are prédomi- 
nantly of the recognition type. The object is to allow pupils to modify 
materials with tools or instruments or to recognize types or qualities of 
materials and to check the responses in an objective manner. Ob- 
jective performance exercises must tell the pupil exactly what to do 
and not allow the order of major steps to depend on recall. For 
example, if it is desired to measure a pupil’s ability to bore a hole in 
l-inch stock with a No. 6 bit, he should be given the bit, brace, and 
stock with directions, but should be carefully supervised to see that 
the directions are followed. This will result in a sample of the pupil’s 
work under standard conditions which can be rated and compared 
with similar samples of other pupils’ work. If, on the other hand, а 
pupil is told to get a No. 8 bit and to bore the hole and he uses & 
No. 16 bit with which to bore the hole, the sample will not be entirely 
comparable with the No. 8 samples. If the pupil makes a mistake in 
the selection of the specified size of bit, it may indicate in a rough way 
that he does not know much about the sizes of bits, but it adds ап 
uncontrolled variable to the performance factor without completely 
testing the pupil’s knowledge of the sizes of bits. Knowledge of the 
different sizes of bits and ability to bore a hole are two different things, 
from the standpoint of test construction. The situation is similar tO 
that in which a teacher asks a pupil to write down the name of а 
cabinet wood, and the pupil writes the word “wallnut.’ The pupil’s 
response is correct, but his spelling is faulty. The temptation is to 
lower the grade because of a misspelled word although the answer is 
correct. In this case the pupil should have a mark in spelling and à 
mark for his knowledge of the wood, but these two variables should 
not be allowed to interfere with each other. The same is true of the 
sizes of bits and the ability to bore a hole. They are independent 
variables both of which should be tested, but not in the same type of 
situation. 

, Performance test exercises may be divided into four groups accord- 
ing to use, namely, (1) tests of quality or accuracy, (2) identifica" 


QUALITY OR ACCURACY EXERCISES 117 


tion of materials or tools, (3) technique of tools or instruments, and 
(4) speed or rate of response. These four concepts, which have been 


diseussed in Chapter V, will be briefly reviewed here with illustra- 


tions of test exercises which have been used in measuring the factors. 


67. Quality or Accuracy Exercises. 

The quality or accuracy of industrial education work is determined 
by carefully evaluating materials whieh have been modified in some 
Significant way with tools, materials, or instruments. A test exercise 
for measuring quality of workmanship must allow the pupil to modify 
materials under genuine and controlled shop conditions, so that the 
results can be rated with reasonable objectivity and compared with 
the results of other pupils who have practically the same background 
and physiological development. The pupil not only must modify ma- 
terials under controlled conditions, but also must modify enough ma- 
terials to give an adequate or reliable sampling of the abilities being 


measured. | | 

The following guiding principles may be helpful in constructing 
objective test exercises of quality: 

1. Provide a job which will give adequate samples of the results of 
the tool or instrument operations being measured. 

2. Give specific directions for doing the work. 


3. Provide all tools and materials necessary. | | 
4. Measure the results by physical measurements, quality rating 


seales, and where necessary, bY inspection. 


SAMPLE OF QUALITY or Accuracy EXERCISES 
14. Operation: To saw to à line with an 8-point cross-cut saw. Saw as accu- 


rately as you can. 


6—3 


е 


Fic. 8. 


good condition. A soft wood board 


win 
cross-cut saw 1 i 5 д 
four sides and laid off as shown in 


Materials: Eight-point 

нё from: knots; б ® 6" x 2, surfaced on 

the drawing. 
Directions: Place 


board in position for sawing. 


118 TESTING TECHNIQUES IN INDUSTRIAL EDUCATION 


15. Operation: To bore a hole with a М” wood bit. 
Tools and Materials: A 1%” bit and brace, a piece of soft wood as indicated in 
the diagram, a bench, vise, and try-square. 


n 
| х 


йек EE 
Fia. 9. 


Directions: Bore holes through the wood block perpendicular to the surface 
at the point indicated. 

16. Operation: To cut a line using tin snips. 

Tools and Materials: A pair of sharp, properly adjusted tin snips, a piece of 
XX tin plate as shown in the diagram with the cutting line marked. 


Fio. 10. 


Directions: Cut the tin into two 1" strips. Cut directly on the line. 


Examples 14, 15, and 16 are samples of short manipulative test exer- 
cises designed to measure ability to saw to a line with a cross-cut saw; 
ability to bore a hole perpendicular to a surface with an auger bit, and 
ability to cut strips of tin with tin snips. Many tool operations can 
be tested in this manner. The construction of complete tests of qual- 
ity or accuracy is discussed in detail in Chapter XI. 

68. Identification Exercises. 

Identification exercises are very useful for testing the pupil's ability 
to recognize materials, instruments, and tools. They are also used for 
measuring a pupil’s ability to analyze special difficulties. The fol- 
lowing significant principles in the construction of identification exer- 
cises should prove suggestive to the teacher: 

1. Provide a representative sample of the objectives to be iden- 
tified. 

2. Suspend materials so that they ean readily be examined. 

3. Score the items by checking the objective written responses. 


The identification exercise is easy to use and is objective in scoring, 
and the same sample panel can be used by changing for testing the 
pupil’s ability to identify a number of different materials, fixtures, OT 
tools. The authors have found it advantageous to suspend the items 


IDENTIFICATION EXERCISES 119 


because it allows the pupil to hold the items in his hands, to lift them, 
to smell them, ete. This gives a natural psychological approach. It 
also prevents certain optical illusions. For example, it is difficult to 
realize that a 6 penny nail is not a 3 penny common when it is fastened 
securely beside a 60 penny spike. 


SAMPLES OF IDENTIFICATION EXERCISES 
Identification of Materials 


17. Directions: Number your paper from 1 through 8 along the left-hand 
margin. Opposite each number write the name of the wood that is hung under 


the corresponding number on the panel. 


Fic. 11. 


Analysis and Identification of Defects in Bells and Buzzers 
18. Directions: Number your paper from 1 through 6 along the left-hand 
margin. Opposite each number write any defect in the bell or buzzer that is 
hung under the corresponding number on the panel. Do not guess. 


120 TESTING TECHNIQUES IN INDUSTRIAL EDUCATION 


Testing for Defective Fuses 


19. Directions: Number your paper from 1 through 6 along the са та 
margin. Test out the fuses with а test lamp. Opposite each number on the 


paper indicate whether the corresponding fuse on the panel is blown or satis- 
factory. Do not guess. 


69. Technique Exercises. 


Technique exercises are designed to measure a pupil' 
manipulating tools, machines, instruments, or materi 
sible to do work of good quality with poor technique 
can scarcely be developed without the fundamental 
tools and materials. Shop technique does not lend it 
ment with complete objectivity. The technique exer 
pupil to do certain things which demand the manipul 
materials, which then provide a means of rating the 
Test exercises of technique, like exercises of qualit; 
and experimentation in their constr 
urement, diagnosis, and teaching. 

The teacher will do well 
ciples in the construction of t 


s method of 
als. It is pos- 
, but great skill 
techniques with 
self to measure- 
cises require the 
ation of tools and 
major techniques. 
, require thought 
uction but are valuable in meas- 


to consider the following guiding prin- 
est exercises for measuring technique: 

1. Provide activities which will call for the use of tools or in-. 
Struments in which technique is to be rated. 

2. Give specific directions for doing the work, 

3. Provide enough activit: 
ous techniques, 


4. Provide necessary tools and materials, 
5. Rate the techniques by using a rating scale, 


SPEED OR RATE OF RESPONSE EXERCISES 121 


Test EXERCISE oN TECHNIQUE 


20. Operation: To saw to a line. 
Directions: Saw the board on the line as marked, perpendicular to the surface. 


Fic. 14. 


d Tools and Materials: Ripsaw and cross-cut saw in good condition. Bench, 
vise, and a piece of soft wood marked as indicated in the drawing. 
Rating Scale As the pupil makes the cuts with the saw, the fol- 
lowing points are observed and checked. Each item is rated on the 
basis of 10, and the score is determined by adding the ratings. 


Sawing 
r тотош ае BB DB I 
ll not be loosened or cracked and 


1. Clamping stock. 
Stock should be held so that it wi 
should also facilitate sawing. 


2. Starting cut. Tok d om їй л do 
With thumb at line, saw should be placed against the thumb. Saw 
should be pulled back slowly a few times to make a groove, then pushed 


forward. 
^а ы. т р шатлал H 


For cross-cut, angle should be 45 


3. Holding saw. 
Saw should be held in right hand. 
degrees; for rip, 60 degrees. 

4. Stroke 
Stroke should be long à 
kept during sawing. Line s 
5. Ending cut. 


One should reach 
being cut off. Saw strokes should 


breaking off the end. 


10. Speed or Rate of Response Exercises. i 
Rate of response is of considerable value in trade courses, but of 
less importance in the cultural courses of the elementary and junior 


ting а shop technique. The construction 
in more detail in Chapter XI. 


оттп T nm 
nd even, not too fast. Proper angle should be 
hould be followed. 

rog o ree unos q T 


over with the left hand and hold on to the piece 
be slow with little pressure to prevent 


8 Sample 20 gives the method of га 
of tests for rating techniques is discussed 


< 


122 TESTING TECHNIQUES IN INDUSTRIAL EDUCATION 


high school. Tests of speed will be discussed and illustrated in detail 
in Chapter XI, but it seems desirable to point out here that quality 
must be clearly defined and held nearly constant or rate of response 
cannot be measured accurately. Speed and accuracy are each vari- 
able factors in achievement and performance. Test exercises designed 
to measure speed or rate of response must present a well-defined ac- 
tivity with appropriate standards. When the pupil can do the activity 
and meet the required standards, then he is ready to take the test to 
see how rapidly he can do the problem. Practice will result in a 
pupil’s improving his score up to the point where further increase is 
limited by native ability. The amount of speed a pupil should have 
can be determined by the demands of the job or by comparison with 
the best efforts of others. 

In the construction of exercises designed to measure rate of re- 
sponse the following principles should be observed: 

1. Define the exact work to be done. 

2. Give definite standards, 

3. Give the pupil a chance to achieve the standards and learn 
exactly what they are. 

4. Do the job on a carefully controlled time basis. 

5. Score on the basis of known quality and time. 


SAMPLE RATE or RESPONSE EXERCISE 

21. Operation: Rate of boring holes through %” soft wood. 
Part I 

Preliminary Activity: Bore three М 


" holes through the piece of soft wi 
given you at the points indicated in the | : mont 


drawing and on the board. 


iL — E 2" hey 
Fic. 15 
Practice boring holes until you can do as well a 
aint ieee s as or better than the sample 
Part II 


1. Say to the pupil, “Now tha 
want to learn how quickl 
way you did in practicin 
than the sample." 


& you can bore holes 
у you can bore three holes, 
g. Continue to bore holes whic! 


accurately and neatly, we 
Do your work the same 
h are as Eood as or better 


GUESSING IN OBJECTIVE TESTS 123 


2. Be sure that the pupil has a. piece of wood ready for boring and that the 
is in the brace and that the vise is in working order. 

ay to the pupil, “Ready, begin" Check the amount of time in seconds 
that is required to bore the holes. 

4. Score the exercises by retaining each sample of boring that is as good as or 
better than the minimum sample. For example, the pupil bores the three holes 
in 60 seconds. Two holes are satisfactory, but one is inferior, so the pupil’s score 
on the exercise is two holes in sixty seconds. 


II. CHANCE FACTORS IN OBJECTIVE EXAMINATIONS 


71. Guessing in Objective Tests. 


Test exercises of the recognition type, in which one or more sug- 
gested wrong responses accompany the correct response, are definitely 
affected by the factor of chance or guessing. Ordinary recall items 
which eall upon the student to initiate and state his response naturally 
are not influenced by this factor. Most alternate-response (true-false; 
yes-no) items open up the possibility of a fifty-fifty chance of the 
individual’s guessing the correct answer in all items about which he has 
no information at all. Multiple-response items of the three-, four-, 
or five-response types decrease this probability as the number of alter- 
nate responses is increased. Within certain limits, chance operates in 
matching exercises, and to a smaller degree in exercises using the re- 
arrangement or the classification testing techniques. 

The actual degree to which chance affects a pupil’s score is almost 
impossible to determine. It depends upon the form of the test exer- 
cises and their arrangement in the test. It also depends upon the 
amount of information, or lack of it, which the pupil has concerning 
the specific item. Tf we reason from ар а priori basis, it is quite ар- 
parent that the pupil who is totally ignorant of the facts involved in a 
test item has a fifty-fifty chance of guessing the correct response in a 
true-false or two-response test. For instance, if an individual were 
to respond to à properly balanced true-false test with the exercises 
themselves covered with a sheet of paper, by marking at random the 
true-false responses along the margin of the paper, this would repre- 
sent а situation in which total ignorance of the items actually oper- 
ated. If the test were long enough to provide a reasonable sampling, 
the resulting score on the test under these conditions should be zero, 
since no knowledge of the test content would be called into play in 
responding to it. Pure chance would be operating. Under these con- 


ditions the individual should mark almost exactly the same number of 


wrong responses as right ones. If the number of right and wrong 


124 TESTING TECHNIQUES IN INDUSTRIAL EDUCATION 


responses did not check closely, it would mean that the test itself was 
not properly balanced, or that it was not long enough to be reliable. 

Now in actual practice, there should be very few items in an exam- 
ination about which the student should be totally ignorant and be 
forced to resort to pure guessing, if the test is made up of valid items, 
selected from the actual content which the student has had an oppor- 
tunity to learn. Accordingly, this fringe of knowledge, slight though 
it may be, should enable him to succeed more often than he fails. In 
other words, pure guessing does not operate in the use of a valid alter- 
nate-response test. In theory, guessing would operate to increase the 
score through a lucky guess just as often as it would tend to reduce 
the score through an unlueky guess. The fact that the pupil is never 
50 ignorant of such a test item as this reasoning would assume, makes 
it desirable to conclude that he at least guesses one exercise right for 
each one guessed wrong. This results in reducing his score in the num- 
ber of exercises right by the number of exercises he missed in the test. 

The best available evidence seems to indieate that the apparent 
validity of alternate-response tests is increased slightly by assuming 
that guessing actually does take place and correcting the score on that 
basis. The net result of the application of this type of correction is 
possibly to over-correct slightly, but in most cases this is not serious. 
Ruch ° has suggested that if correcting for chance in recognition forms 
appears to be unsatisfactory, approximately the same effect may be 
brought about through inereasing the length of the test by the use of 
10 to 15 per cent more test items than would be required for the ex- 
pected reliability of measurement. 
T2. Correcting for Chance in Objective Tests, 


The typical procedure for the correction of exer 


| cises for the opera- 
tion of chance may be generalized in the following 


formula: 
=p._W 
бев N-1 


in which C is the corrected score, R is the number of exercises an- 
swered correctly, W is the number of exercises answered incorrectly 
Ld 


and N is the number of choices in the exercise. Thus, if there are 5 
choices in a multiple : 


taking М. of the number of wron 


p alternate-response 
Ise tests N equals 2. 


? Ruch, G. M, The Objective or New-t pe Examinatio т, 
» t 
: i V. " Scott, Foresman and 


CORRECTING FOR CHANCE IN OBJECTIVE TESTS 125 


Thus, the denominator of the formula becomes 1, and the net result is 
to deduet the number of wrong responses from the number answered 
correctly. For example, à student in responding to a true-false exam- 
ination consisting of 125 items, omits 11 and answers 14 incorrectly. 
The number of exereises he answered correctly (R) is found by sub- 
tracting the omitted and incorrectly answered exercises from the total 
number of items in the test. 125 — 11 = 114; 114 — 14 — 100, the 
number right. The correction for guessing involves taking the wrongs 
from the rights (R—W). Accordingly, the corrected score is 100— 14, 
or 86. In cases where the student misses more exercises than he 


answers correctly, the practice ordinarily followed is to assign scores of 
е score. Practically, the individual 


zero, rather than to show a negativ 
could scarcely know less than zero, and furthermore, it is likely that 
such a situation arises out of the unreliability of the test itself. 


Attention should possibly be directed once more to the matter of 
the specific instructions to be given the student in the use of recogni- 
tion-type tests. The best practice, based on à conservative estimate of 
the available evidence, seems to be to direct the pupils not to guess in 
taking the test, but to correct the resulting scores on the test exactly 
as if they had guessed. The only exception to this general rule seems 
to arise in the use of double true-false exercises. In this case, it ap- 
pears desirable to encourage the student to attempt to answer every 
possible exercise in both parts of the test. The method of scoring the 
test in terms of pairs right takes adequate care of any tendency or 


necessity on his part to resort to pure guessing. Furthermore, unless 
the items are utterly invalid, he must have a fringe of information 
he might be tempted to omit under the con- 


about many items which е 1 to der 
ditions of the typical true-false test. Since missing or omitting one or 
kes it impossible for the pupil to score 


both of the paired exercises ma 
on that vai ê should be given the benefit of the doubt and a chance 


to score on every pair of exercises. 


Iv. TYPES OF ESSAY-TYPE EXAMINATION EXERCISES 


am of measurement of classroom prod- 
ucts in the industrial arts js most likely to be advanced through the 
elimination of the subjective features of the teacher's judgment at all 
possible points, it must nevertheless be recognized that certain de- 
sirable products of the classroom and shop simply do not lend them- 
selves to the objective approach. Furthermore, many teachers. of in- 
dustrial education wish to make use of essay-type tests occasionally 
for other reasons. Since this type of test is still used, and probably 


Although а general progr 


126 TESTING TECHNIQUES IN INDUSTRIAL EDUCATION 


always will be to some extent, it is unquestionably desirable to point 
out here some of the possibilities and limitations of this type of meas- 
urement, as well as to submit certain suggestions which if carefully 
utilized may result in the distinct improvement of the less objective 
methods of measurement. 

Essay-type examinations, though generally not so reliable as the 
average objective examination, frequently secure measures which are 
just as valid as they would be if stated in objective form. The lack 
of reliability in the essay or traditional examination lies mainly in the 
limited extent of the sampling which the use of this form of question 
permits, and in the lack of objectivity in Scoring the items. Many 
examinations composed entirely of essay questions are valid in the 
general sense of the term. Their limitation results from the incom- 


pleteness of the sampling taken and from the uncertainty with which 
the results are evaluated. 


73. Traditional Examination Questions. 


The traditional or discussion-type examination is almost uniformly 
made up of recall questions. The following types are representative: 
I. SIMPLE RECALL, 


Samples: 


1. Name four different types of wood stains, 
2. Name the different grades of sandpaper used in woodfinishing. 
3. Name the ingredients of paste wood filler, 


II. Description, 
Samples: 


1. Why is walnut a good cabinet wood? 

2. What are the chief characteristics of red wood? 

3. Why is balsa a favored wood for constructing model air craft? 
4. What are the characteristics of quarter-sawed oak? M 


III. COMPARISON AND ANALYSIS, 


Samples: 
1. What is the difference 
radio-frequency circuit? 
2. How does a dynamic speaker differ from 
3. What is the difference 
aerial? 
How does a battery set differ from an 
5. Is there a difference in the underlying 
4 magnetic speaker? Explain. 


between a superheterodyne circuit and a 


а magnetie one? 
between an inside aerial and an outside 


A 


a-c set? 
principle of head phones and 


SCORING ESSAY-TYPE TEST EXERCISES 127 


IV. PROCEDURE. 


Samples: 
1. Give the steps in applying a rubbed varnish finish. 


2. Give the steps in squaring a board. 
3. Give the procedure for fuming oak. 
4. Give the procedure for tinning and soldering copper. 


74. Constructing Essay-Type Exercises. 

On first thought the essay-test exercise seems easier to prepare 
and use than the objective type. In the way the traditional examina- 
tion is ordinarily used, or perhaps we should say misused, it does take 
less time to prepare and the results obtained are much more sub- 
jeetive and unreliable than those from objective tests. If the essay- 
type test is constructed so as to give а fairly reliable result it is not 
easier to construct than the objective test exercise. In fact, it may 


demand a great deal more time and careful thought. 
The following rules have been found very helpful in the construc- 


tion of essay-type questions. 
1. State the question in a simple, 
the reproduction, comparison, or eva 


structional material. 

Poor. Name all the stains you can. 

Beller. Name four types of wood stains. 

answer that is expected for each 
ay be either in outline, brief para- 


direct manner so that it demands 
luation of a specific unit of in- 


EXAMPLE: 


2. Write out just exactly the 
essay question in the test. Thism 
graph, or diagram. 

ExawPLEs: 1. Name four ty 


Teacher’s answer: 1. Water stains. 
2. Oil stains. 


3. Spirit stains. 
4. Chemical stains. 
ed to cabinet work than fir? 

Teacher's answer: Walnut is better suited for cabinet work because of its 
natural beauty, color of the wood, close grain, and durability ; it is stronger 
than fir, does not splinter as readily and takes a better finish. 


pes of wood stains. 


2. Why is walnut better suit 


75. Scoring Essay-Type Test Exercises. —— 
test exercises can be scored, the less 


The more objectively the essay 
the results will be influenced by the personal judgment of the scorer. 


The following suggestions have been found valuable for use in cor- 
recting essay-type exercises: 

1. Tests should be scored by the one who makes out the ques- 
tions. He should know exactly what responses are intended and write 


them down. 


128 TESTING TECHNIQUES IN INDUSTRIAL EDUCATION 


2. Each pupil taking the test should write his name on the back 
of the test paper, and the scorer should disregard the name until the 
test is scored. This eliminates the subjective factor of being influenced 
or biased in judgment because of former contacts with the pupil. 

3. The scorer should not mark off for misspelled words, sentence 
structure, paragraphing, poor writing, etc. Similarly, he should not 
increase the score for excellence in these things. However, such fac- 
tors may be indicated or checked on the examination. The reason for 

‚ this is that the test is to measure the pupil's knowledge of certain in- 
formation in an industrial education course. If it is desirable to test 
a pupil’s ability to write, spell, or use correct written English, suitable 
tests should be given for this purpose which are valid and reliable. 

4. Essay test exercises can be corrected most simply by correcting 
each item in all the tests rather than by correcting the entire tests 
separately. This enables the scorer to concentrate on the answer to 
one test exercise and thus he is better able to judge the merits of the 
several pupil responses to the same question. 

5. Rate each question on a scale of 10 or 20 and then add the 
ratings on all the essay test exercises to get the mark for the paper. 
This method helps to objectify the score. The score is based on a 
number of careful judgments rather than on 


one complex judgment 
for the entire score. 


The essay-type exercise can be made much more objective and the 
subjectivity of the teacher’s marks can be significantly reduced if a 
method similar to the one outlined in the preceding paragraphs is fol- 
lowed. However, certain limitations of the essay-type question are 
obvious. Frequently the time gained by the teacher in preparation is 
lost in scoring. At best, the essay-type test is not as valid or reliable 
as an average objective-type test. Research studies have shown the 
reliability to be around .59 on an average. 

Kelly? and Fauber and Ruch“ have shown th 
teachers' marks can be reduced significantly through 
rules. But even if the subjectivity of te 
by half, they still would not provide measures which are nearly so 
reliable as those obtained from objective tests. Rueh states that 
"Experience and experiment have shown that the results of ап essay 
examination eannot be evaluated fairly by human minds.” 2 Tn addi- 

1° Kelly, Е. J., Teachers’ Mark: ibuti 1 
No. 66, p. 83, Columbia Ше Nm ran н ашышы каны, 

11 Unpublished master's thesis, 1926, University of Iowa, 


?? Ruch, G. M., The Objective or New-T inati 
Ў T -Type Exa L 
Scott, Foresman and Company, Chicago, 1929. мна anes p: Å 


at subjectivity of 
the use of scoring 
achers’ marks could be reduced 


SELECTED REFERENCES 129 


tion to the psychological difficulty of making а complex judgment, 
there is also the serious disadvantage of limited sampling, which has 
been mentioned previously. The average objective tests will sample 
at least five times as widely into а field of information as an essay 
examination requiring the same testing time. 


SUMMARY 

{ог use in the construction of tradi- 
tional examinations and informal objective tests are summarized in 
this chapter. The differences between the standardized test and the 
informal objective examinations are pointed out. 

The problems involved in controlling the chance or guessing factor 
in certain forms of objective tests are treated briefly in this chapter 
because of the close relation of this factor to the technique of relia- 
bility of measurement used. Recognition is given to the fact that 
not all the measurement that goes on in the classroom and shop should 


be objective. 


The more important techniques 


SUMMARY EXERCISES FOR DISCUSSION 


er-and-pencil tests which are useful in other educa- 


1. Why are many of the pap 2 use 
ted to the demands of objective measurement in 


tional fields not well sui 
industrial subjects? ا‎ А Т 
of the main types of objective exercises suited for 


2. Illustrate by example each 
use in measurement in industrial arts courses. | 
3. What are the main advantages and disadvantages of objective examinations? 


4. Show how the general formula for correcting for guessing in objective tests 
actually works jn an alternate-response test and in a five-response test. 
5. Evaluate the suggestions for improving the objectivity of scoring of essay- 


type exercises. 
SELECTED REFERENCES 


BUCKINGHAM, B. Re Research for Teachers. New York: Silver, Burdett and 
Company, 1926. 
yum, Ж. В. and Rucu, С. М, “Qn Corrections for Chance in Multiple- 
Response Tests,” Journal of Educational Psychology, Vol. 18: 48-51, 1927. 
GREENE, CHARLES E. e Tests,” Research Monograph No. 3, Denver, 
Д 3 „ 


Colorado, 1926. Р 2 
teo ШЫЛ and JORGENSEN, A. N. The Use and Interpretation of Educational 
School Tests. New York: Longmans, Green and Company, 1929. 


Lane, Атвент R., Modern Methods in Written Examinations. Boston: Houghton 
, „ 


Mifflin Company, 1930. 

Орки, ‘CW fion Examinations and New-type Tests. New York: The 
Century Company, 1928. E е 

оиы, C. W. Educational Measurement 1n High School. New York: The Cen- 
tury Company; 1930. 


“New Typ 


130 TESTING TECHNIQUES IN INDUSTRIAL EDUCATION 


ORLEANS, J. S., and SEELY, G. A., Objective Tests. Yonkers: World Book Com- 
pany, 1928. 

Rucu, G. M., The Objective or New-type Examination. Chicago: Scott, Fores- 
man and Company, 1929. 

5мїтн, Н. L., and Wricut, W. W., Tests and Measurements. New Yi 
Burdett and Company, 1928. 

Symonps, P. M., Measurement in Secondary Education. New York: The M 
millan Company, 1927. 

Toors, H. A., “Trade Tests in Education,” Teachers Colle 
sity, Contributions to Education, No. 115, 1921. 

WEIDEMANN, C. C., “How to Construct the True-Fa 


alse Examination," Teachers 
College, Columbia University, Contributions to Education, No. 225, 1926. 
Wirsos, G. M, and Hore, К. J. 


How to Measure. New York: The Macmillan 
Company, 1928. 


Woop, Ben D., Measurement in Н 
Book Company, 1923. 


ork: Silver, 
ac- 


ge, Columbia Univer- 


igher Education. Yonkers, New York: World 


CHAPTER XI 
CONSTRUCTION AND USE OF INFORMAL SHOP TESTS 


I CONSTRUCTION OF AN INFORMAL OBJECTIVE TEST 


76. Steps in Building an Objective Information Test. 

with certain other school subjects, the industrial arts 
different phases of achievement for meas- 
urement. One of these phases is expressed in terms of the ability of 
the individual student to go into the shop, and, by following specific 
directions, come out with a product of a given quality which is itself 
evidence of achievement. This is a test of performance. The other is 
expressed in terms of knowledge of facts and their relationships which 
may lie back of the student’s actual performance. This is a test of 
information. The performance test calls for direction, action, produc- 
tion. The information type is usually à paper-and-pencil test. It is 
obvious that the best possible test of information eannot be wholly 
valid for any industrial edueation eourse because of the fact that it 
deals only with information and does not measure such factors as 
ate of response, techniques, and personality traits, which 
are important elements of the course. Both are essen- 
measurement of accomplishment in this field. Тһе 
{урев of tests are diseussed in the pre- 
structing informational types of tests, 
e evaluation of the quality of prod- 
litions, are set forth in 


In contrast 
subjects present two widely 


quality or r 
unquestionably 
tial to complete 
testing techniques for both 
ceding chapter. The steps in соп: 
and in deriving rating scales for the ev 
ucts obtained under performance testing cond 


this chapter. T ^ 
The distinctive feature of the teacher-made objective examination 
which «makes it especially useful in the evaluation of classroom 
ith which its content can be made to 


achievement is the closeness W 
tually taught to the class. This is 


parallel the subject-matter ac y at Ee tee И 
merely another way of stating that its validity is high in proportion 


to the extent that the teacher includes in the examination exercises 


sampling from facts which the students have had an opportunity to 
learn, A teacher who knows his pupils and his subject-matter may 
i : ve examination whieh will have all the 


readily construct an objecti 
у of the standardized test (except the standards or norms them- 
131 


182 | CONSTRUCTION AND USE OF INFORMAL SHOP TESTS 


selves), with few or none of the limitations. Critical selection of items 
from the important phases of the subject which have been given in- 
structional emphasis will guarantee the validity of the test. Ob- 
servance of the simple principles of formulating objective exercises 
will produce an objective test. A wide sampling over the significant 
phases of the subject will produce a test long enough in terms of 
items and working time to secure reliable results. Teacher-made tests 
which meet these three criteria leave little to be desired. 


TT. Securing Validity. 

Objectivity and reliability in an exar 
are functions of the form of the exerci: 
sampling of items taken. That is to say, they are the results of the 
application of certain principles of measurement which can be learned 
by any classroom teacher. The first constructive step in the develop- 


ment of the informal objective examination is the establishment of a 
basis for the validation of its content. 


Course of Study as the Basis for Validit 
examination presumes a knowledge of ex: 
material to be taught, as well as a backgr 
ment adequate for a critical evaluation 
significance of the various units of instru 
enough, that the teacher must have 
content of the course of study. 


mination are qualities which 
ses used and the breadth of 


y. The validation of an 
act details of the curricular 
ound of experience and judg- 
of the social and practical 
ction. This means, clearly 
an intimate knowledge of the 


ОЗН | › of-study outline in wood- 
Working is presented. "The co; i i 


$ is not put forward 
as ideal, but rather as a body of information and skills affording ma- 
terials suitable for illustrating several types of tests useful to the in- 
dustrial education teacher, The illustration from woodworking is used 
here mainly because it is the most widely taught and best understood 
instructional division in industrial education, and because practically 
all the testing techniques suitable for use in this field may be applied 
to other industrial educatio 


n subjects. 
Course-of-Study Outline in Woodworking, 


r the constru 


SECURING VALIDITY 133 


OBJECTIVES OF THE COURSE 


1. To develop an appreciation of good materials and workmanship. 
2. To develop handyman abilities with common tools and materials. 
3. To develop hobbies for leisure-time activities. 

4. To further intelligent choice of life occupations. 

5. To give information about the industries and their workers. 

6. To develop desirable social traits and attitudes. 

7. To provide opportunity for planning and problem solving. 

8. To motivate and vitalize academic learning. 


What the boys should be able to do with woodworking tools * 


To use a rule in measuring. 

To use dividers or compass for laying out eurves and dividing spaces. 
To use a try-square for testing. 

To adjust a plane. 

To square a piece of stock. 

To saw to a line with a rip or cross-cut saw. 
To use back saw. 

To use coping saw. 

To bore holes in wood. 

. To fasten with screws. 

. To trim or pare with a chisel. 

. To use scraper. 

. To use sandpaper. 

. To drive and draw nails. 

. To lay out and cut a chamfer. 

. To glue up work. 

. To fit hinges. 

. To make butt joint. 

. To make dowel joint. 

. To sharpen edge tools. 


ee 
Г © фо сог с л озю 


юы 
Sob5btustuwrzeb 


What the boys. should know about wood and the divisions of the industry 


1. Know the principal characteristics, working qualities, principal uses, and 
sources of supply of the following woods: pines, cypress, oak, walnut, ash, birch, 
maple, mahogany, red cedar, hickory, gum, chestnut, and poplar. А 
. How lumber is cut and milled. 

. Standard dimensions of lumber. 

. Knowledge of veneer and plywood. 

. Kinds of glue and its preparation. 

. Kinds of nails and their uses. 

. Kinds and sizes of screws. 

. Kinds and grades of sandpaper. 

. Grades and uses of steel wool. 

10. Distinguishing characteristics of period furniture. 


ою оомо Ct WN 


1 Adapted from the A. V. А. Committee's Report on Standards in Industrial 
Arts "С 


134 CONSTRUCTION AND USE OF INFORMAL SHOP TESTS 


11. Basic principles of good design in furniture. 

12. Use of common types of hinges and fasteners on woodworking projects. 
13. Kinds of grinding and sharpening stones. 

14. Location of manufacturing concerns and labor conditions. 


What the attitude of the boys should be? 


. Industrious. 

. Cooperative. 

. Self-reliant. 

- Considerate of the rights of others, 
Ready to assume responsibility. 

Loyal. 

. Fair minded. 

. Optimistic toward life. 

. Law abiding. 

. Appreciative of duty in common things. 


POND Np 


m 


Selection of Major Groups of Informational Items. 


The next step 
in the validation of the content of 


an informal objective test of infor- 
mation is to select, in the light of the objectives set up for the course, 
the groups of skills which are informational in character and which 
can be measured by means of a paper-and-pencil test. The following 


summaries represent the major groups of such informational aspects 
found in the foregoing outline on woodworking: 


- Different types of planes. 

. Different types of saws. 

. Sizes of wood bits. 

. Sizes of screws. 

. Kinds and sizes of chisels. 

. Procedure in squaring stock. 
- Sizes of sandpaper. 

. Sizes of nails. 

. Glue and its use. 

10. Different types of hinges. 
11. Kinds of wood stain. 

12. Types of fillers, 

13. Different types of brushes, 
14. Composition of shellac. 

15. Enamel and its composition. 
16. Varnish and its composition. 
17. Different kinds of paint, 

18. Composition of wax. 

19. Composition of lacquer, 

20. Common joints, 


OONDPAN 


? This unit is the same f 


à or all shop subjects, and prob: 
curriculum, 


ably for the entire 


SECURING VALIDITY 135 


21. Steps in applying stain, filler, shellae, varnish, enamel, paint, wax, and 
lacquer. 

22. Steps in squaring stock, preparing glue, using sandpaper, sharpening wedge 
edge tools, boring holes, and fastening with screws. 

23. Principal characteristics and uses of common woods. 

24. Dimensions of lumber. 

25. How lumber is cut and milled. 

26. Veneer and plywood. 

27. Grades and uses of steel wool. 

28. Principles of design. 

29. Characteristics of widely known period furniture. 

30. Types of grinding and sharpening stones. 

31. Manufacturing concerns and labor conditions. 


Suggestions for Securing a Valid Sampling of Informational Con- 
tent. The specifie problem of this discussion is to demonstrate how it 
is possible to secure a valid sampling of the informational content of 
the course. The following rules will be found very helpful in accom- 
plishing this: 

1. Keep clearly in mind the objectives of the course. Try to for- 
mulate questions which will measure the extent to which the objectives 
have been achieved. Emphasize the relative and social utility of the 
subject-matter and avoid purely factual questions unless they are 
essential to building up concepts. 

2. Ask questions which the objectives indicate are of most im- 
portance, but under no circumstances ask questions included merely 
to “stump” the pupils. Trick questions, and unusually difficult ones, 
are only dead weight in the test, waste valuable testing time, and in 
general lower the validity of the test. 

3. Ask a large number of questions over all parts of the course. 
The different types of objective test exercises are best suited for testing 


a large number of items in the time ordinarily allotted to measure- 


ment. 
4. Have other teachers make suggestions as to the importance of 


the exercises selected for the test. Take into consideration the com- 
ments of pupils as to the value of the different test items. If the pupils 
consider them unfair, obscure, and too easy, they should be eliminated 
or modified before the test is used again. 

5. The test cannot be more valid than the course of study on 
which it is based. The progressive teacher will revise his course and 
tests from time to time to bring the work abreast with good practice 
and the results of curriculum research. 

In establishing validity it is a good policy to construct 200 or 250 
test exercises based on the course. This furnishes sufficient material 


136 | CONSTRUCTION AND USE OF INFORMAL SHOP TESTS 


to eliminate undesirable test exercises and still have adequate material 
for at least two forms of the test. One form is sufficient for a valid 
examination, but two forms make it more valuable. The second form 
may be used for testing those absent from the first test, or the test 
may be used from year to year, alternating the two forms. 


T8. Securing Objectivity. 


After the major informational topies have been agreed upon in the 
light of the teaching objectives and, when possible, passed upon by 
other teachers, the items can be expanded and developed into objective 
test exercises. It is a good policy to use the type of objective exer- 
cise which best fits the material, rather than to attempt to make all 
exercises conform to a single type, as true-false, multiple-choice, etc. 
Chapter X gives examples of test exercises whieh have been found 
valuable in testing information in shop courses. 

Two fairly satisfactory methods of procedure are suggested for 
recording the test exercises as they are developed. In one, the teacher 
may write the exercises on sheets of paper allowing a half inch between 
each question, so that, after the questions have been formed, the paper 
can be cut into strips with one question on a strip. These strips can 
be shifted to eliminate the less desirable ones according to the teach- 
er’s plans. Similar types of test exercises е 
in manipulation of the items. 
use 3 inch by 5 inch cards and a eard index. 
а separate card, and the cards 
test exercise used. The first draft of the 


he has given them critical analysis. 
inated as desired. The authors hay 
handier and neater but a little more expensive. 


REARRANGING ITEMS ON THE BASIS OF DIFFICULTY 137 


because they do not know the correct responses to the exercises. If 
many pupils fail to understand what to do, it is probable that the in- 
Structions are at fault. If this is not corrected the reliability of the 
test will certainly be lowered. The directions which accompany the 
samples of objective tests presented later in this chapter are examples 
of adequate statements. 


79. Rating Exercises as to Difficulty. 


After the questions have been developed and the directions and 
practice exercises perfected, the next step is to arrange the different 
groups of test exercises in the approximate order of difficulty from 
easiest to most difficult. This can be done roughly through inspection 
and rearrangement of the exercises by the teacher. If several teachers 
pool their judgments of the rankings from easiest to most difficult, the 
results will be more reliable. This arrangement of items in order of 
difficulty can be further refined after the test is given, by recording the 
number of pupils who respond correctly to the various items. Order of 
difficulty is quite important in a test because it saves the pupils’ 
time and secures from them a better psychological reaction. The pupil 
is given an opportunity to answer first the exercises that are easier for 
him and he is not so likely to use all the testing time on difficult items 
and fail to answer many that he does know. The arrangement of items 
on the basis of difficulty probably increases the apparent reliability of 
the test. 


80. Rearranging Items on the Basis of Difficulty. 


The following true-false statements, taken from a longer test, are 
in the order in which they appeared when the test was first given. After 
the test was given and the pupils’ responses were analyzed, a better 
order of arrangement in the test was possible. The numbers at the 
right indicate the order of increasing difficulty based on an analysis of 
the responses of 50 pupils. The exercise numbered “1” is the easiest 
item, ie, was answered incorrectly by the smallest percentage of the 
class, 

ORIGINAL ORDER REVISED ORDER 
tools before putting them away. 


s used to make a line parallel to an edge. 
ooth surface for finishing. 


Wipe moisture off of 

The marking gauge i 

Sandpapering is done to get a sm 

Stain may be applied with a cloth ora brush. 

То produce a good surface for finishing, sandpaper across the 
grain. 

Varnish is thinned with alcohol. 1 

Good paint preserves the wood. 


oR ONE 
бо ҥнҥ ч о 


SUR 


188 CONSTRUCTION AND USE OF INFORMAL SHOP TESTS 


ORIGINAL ORDER REVISED ORDER 
8. Paint and varnish may be applied satisfactorily on damp 
surfaces. 3 
9. Auger bits are graduated or numbered in sixteenths of an inch. 4 
10. To bore a clean hole with an auger bit, bore through until the 
spur shows, and then finish boring from the other side. 5 


It will be noted that only one of the easiest questions as indicated by 


an analysis of 50 pupil responses is in the first five items as listed in 
the original test arrangement. 


81. Securing Reliability. 


The next essential in developing an informal objective examination 
is to make it long enough to secure an acceptable reliability of meas- 
urement. Reliability is obtained by sampling over a wide range of 
content and by stating a large number of valid questions in objective 
forms which are within the mental and educational range of the pupils 
to be tested. Properly constructed objective tests of 75 to 100 or more 
exercises are usually highly reliable, whereas the ordinary six-, eight-, 
or ten-question essay-examination, with its limited sampling and sub- 
jective scoring, is almost never sufficiently reliable, 

Sampling as a Factor in Reliability. The brevity of the statement 
and the ease with which the response is recorded make it possible for 
the student to respond to many more objective exercises in a specified 
period than to those of the discussion types. This makes it possible 
for the objective examination to cover a much wider area of subject- 
matter, or to cover a given area a great deal more intensively than is 
possible with the other type of exercise. The manner in which this 
factor of sampling operates to protect both the pupil and the teacher 
against the injustices of unreliable measurement is shown very clearly 
in Fig. 3, page 35, and is discussed in detail on pages 34 and 35. 

Specific Hints on Securing Reliability in an Examination.—The 
following suggestions have been found useful in Securing high relia- 
bility in objective informal tests: 


1. Include from 50 to 100 items, 
definite units covering the entire area of the unit of the course. This 
step is closely related to Securing high validity, but is considered here 
from the standpoint of reliability alone. 

2. Make the questions objective in type. "This eliminates the vari- 
able factor of the teacher's subjective judgment and gives assurance 
that all responses will be rated on the same basis, 

3. Eliminate the dead wei 
which are so easy that ovi 
correctly. Do not 


each item being selected from 


ght from the test. Do not include items 
er 80 per cent of the class answi 


3 i er them 
include items which are so difficult that 1 


ess than 


INFORMATION ASPECTS OF ELEMENTARY WOODWORK 139 


20 per cent of the class give the correct response. It is probable that 
test items which are missed by 80 per cent or more, or are missed by 
only 20 per cent or less of the class, do not differentiate pupil accom- 
plishment adequately. These items can be determined by short tests 
during the term before they are put into the final test, or they can be 
eliminated after the test has been used once. 

4. Control the conditions for giving the test. Define specific direc- 
tions and conditions for administering the test. 

5. Provide a key with the correct responses. It may be necessary 
to modify or give alternate answers on some completion exercises. 
The key, like other phases of the test, can be refined best after the test 
has been given. 

82. Sample Objective Tests on Information Aspects of Elementary 
Woodwork. 

A sample of the results of following through the steps in the con- 
struction of an objective examination as outlined in this chapter is 
shown in this section. The specimen is an experimental form of an 
objective test in woodworking which has been prepared and used by 
one of the authors in connection with his shop work. This test is in 
four parts requiring a total testing time of 42 minutes and having a 
possible total score of 94 points. Part I consists of 39 true-false items; 
Part II of four exercises in procedure-arrangement with a total point 
score of 22 points; Part III of 24 completion exercises; and Part IV 
of 9 multiple-response items. The reliability of this test based on 100 
cases is 84, The total time requirements and the total possible score 
for each part are given in Table 26. 


TABLE 26 
CONTENT OF OBJECTIVE EXAMINATION 
Possible 
Part Type of ides Total 
Exercise | Allowances | Score 
1 T-F 15 39 
II Pro--Arr. 8 22 
ш Compl. 12 24 
IV M-R 7 9 
Total 42 99 


140 CONSTRUCTION AND USE OF INFORMAL SHOP TESTS 


Directions то PuriL 


This test is divided into four parts. Specific directions are given for each 
part. The amount of time allowed is indicated below the directions. You are 
to stop working on each part when the teacher calls time. Do not begin the next 
part until the teacher reads the directions and gives the signal to begin work. 

Do not waste time on a question you know nothing about; skip it, and go 
on to the next one. Do mot guess. 

You are now ready to study the directions for Part I. You are allowed 15 
minutes to complete Part I. Do not ask questions about the test after you begin 
work. If you break your pencil or need an eraser ask the teacher for one. 


Parr І. True-Farse 


The following statements are to be answered by drawing a circle around the 
capital letter T or F which follows the statement. The letter Т stands for true. 
and the letter F for false, or untrue. 

If the statement is true, draw a circle around the T; if false, draw a circle 
around the F. The sample exercises are answered correctly. 


, 


Sample: Nails are made of wood. т (®) 
Nails are made of metal. @) F 


You are allowed 15 minutes for Part I. Wait for the signal! 


Exercises * 
1. The surface of wood is planed to make it smooth. 


т # 
2. Before putting away tools always wipe off any moisture that may 
be on them. - m Е 
3. The marking gauge is used to make a line perpendieular to an 
edge. Y. 
4. Sandpapering on edges and surfaces is done in the direction of the 
grain. T F 
5. Sandpapering is done to get a smooth surface suitable for finishing. T F 
6. Sandpaper should be wrapped around a block when sanding flat sur- 
faces. T F 
7. Wood is stained in order to improve its appearance, E F 
8. Stain may be applied with a cloth or a brush. T F 
9. A well-made glue joint is weaker than any other part of the wood. T F 
10. To obtain a good surface for finishing sandpaper across the grain. T F 
11. No. 0 sandpaper is coarser than No. 00. T F 
12. Varnish is thinned with alcohol. т F 
13. A first-class job of finishing can be had with only one coat of var- 
nish if it is put on thick enough. T F 
14. Varnish can be smoothed down with fine steel wool. T d Е 
15. In rubbing down varnish with powdered pumice Stone, the pumice 
should be rubbed on dry. Т F 
16. А drawing board is made of oak, or some other hard wood. T Е 
17. The visible lines of an object are shown on a drawing by solid 
black lines, P T Е 


INFORMATION ASPECTS OF ELEMENTARY WOODWORK 141 


18. The tee-square is used for making horizontal lines. T F 
19. Thumbtacks should be hammered into place on the drawing board. T F 
20. Templates are used to make, or mark out, a shape on a board. T F 
21. The mortise of the mortise-and-tenon joint is the rectangular hole 
into which the tenon fits. lu F 
22. Good paint preserves the wood. q F 
23. Paint and varnish may be applied satisfactorily on damp surfaces. T F 
24. New wood soaks up much linseed oil. q p 
25. New wood should have a priming coat applied before the finish 
paint is put on. T F 
26. Paint and varnish are made from the same materials. È F 
27. It is not necessary to brush paint out well when applying it. T Е 
28. If shellac is too thick, thin it out with turpentine. qi Е 
29. Shellac is а slow-drying finish. Т; Е 
30. Shellac makes a. waterproof finish. T F 
31. It is easier to clean out a brush after varnishing with it if it is 
allowed to dry for 24 hours. T F 
32. After paste wood filler has been applied, the surface must be rubbed 
across the grain. T F 
33. Hand screws should be adjusted before any glue is applied to the 
pieces to be glued. T Е 
34. Handscrews hold best if the jaws are parallel to each other. T F 
35. It is easier to drive in a screw if the screw-driver has a round tip, 
than if it has a square one. T F 
36. Augur bits are graduated or numbered in thirty-seconds of an inch. T F 
37. To bore a nice clean hole with an augur bit, bore through until the 
spur shows, and then finish boring from the other side. ils F 
38. A tec-bevel square is used when one wants to lay out an angle. Р Е 
39. More accurate work сап be done if knife lines are used, rather than 
m F 


pencil lines. 
Slop and wait for the directions for Part II! 


Parr II. PROCEDURE-ARRANGEMENT 


On this page are several jobs, and the steps necessary for doing the job. 


However, the steps are not placed in the correct order. 

Decide which step should be done first, and place the number of the step in 
the first parenthesis, then the number of the second step in the second paren- 
thesis, and so on until all the steps are down. The sample exercise is answered 


correctly. 


Sample: To apply stain. of 
Let stand for 2-3 minutes. 


Apply stain. 
Smooth the surface with sandpaper. 


Wipe off excess stain with cloth. 
Select a suitable stain. 

(3)... (5)... (2)... (D... (0) correct order. 
You are allowed 8 minutes for Part II. Wait for the signal! M 


m EREMO 


142 CONSTRUCTION AND USE OF INFORMAL SHOP TESTS 


Exercises 


1. List the following grades of sandpaper according to coarseness, placing the 
finest grade first. 

. No.0. 

. No.%. 

. No.2. 

. No.1%. 

. No.00. 

No.1. 


@ kou Quat sl el ol 5 
2. To square up a board. 


С Or Co IO юы 


1. Plane a surface true; mark it No. 1. 
2. Plane one edge square with No. 1; mark it No. 2. 
3. Gauge and plane to thickness, square with edges and ends; mark No. 6. 
4. Cut to width and square other side with No. 1 and 3, and mark No. 5. 
5. Plane one end, and square with No. 1 and 2; mark No. 3. 
6. Cut to length, square other end with No. 1 and 2; mark No. 4. 
С Lodo deed dex Yee Jed ) 
3. Apply paint on new wood. 
1. Apply first coat of finish paint. 
2. Shellac the knots. 
3. Apply second coat of finish paint. 
4. Clean off any grease or dirt with cloth wet in benzine. 
5. Apply coat of priming paint. 
€ Жеб Jew Meal Jol J 
4. To bore a hole with a brace and bit. 
Fasten bit in brace. 


Withdraw bit and finish boring from opposite side. 
Mark location of hole. 


Bore through until spur shows on other side. 
Select, proper size bit. 

© зб Deus Decal’ Jess ) 
Stop and wait for the directions for Part ПІ! 


oR оюк 


Parr III. COMPLETION Exercises 
Each of the following statements has one or two words left out. When the 
correct word or words are inserted in t 


y he blanks left for them, the sentences 
are specific and complete, The sample exercises are answered correctly, 


Samples: 


Nails are driven with a , hammer А 


Screws are driven with a Here screw-driver is the correct, word. 


1 Pg ce and write it in the blank space left 
for it. 


You are allowed 12 minutes for Part III. 
Wait for the signal! 


INFORMATION ASPECTS OF ELEMENTARY WOODWORK 143 


Exercises 


1. A fine їз used for whetting a plane blade. 
2. Sharpening chisels and plane irons on an oilstone removes the 


3. The thickness of a shaving is regulated by the 

4. The — — — — holds the plane blade in place. 

5. When plane is not in use lay it on its 

6. In starting a shaving cut with a plane, press —__— upon the knob of the 
plane. 

7. In driving nails, at first use ——— steady strokes. 

8. Inserting а wood block under the hammer when pulling nails prevents 

the wood. 

9. The - should be used to guide the saw when starting a cut. 

10. Saws work easier when rubbed with ———— occasionally. 

11. An augur bit is inserted into the ——— of the brace. 

12. The number on the tang or shank of a bit indicates its —___. 

13. A. іѕ used when boring a number of holes the same depth. 

14. The teeth of a coping saw blade should point ——— handle. 

15. The cross-cut saw is used for cutting ——— the grain. 

16. The tool used for setting nails below the surface is called а 

17. The cutting action of a ripsaw is like that of a number of 

18. The cutting action of a cross-cut saw is like a number of 

19. The ripsaw is used for cutting the grain. 

20. A... cornered file is used to sharpen saws. 

21. Damp spongy wood requires a saw with plenty of & 

22. The. . works as a crank and holds the bit when boring. 

23. The size of an augur bit in of an inch may be found on the tang 
or shank. 

24. The eut made by a saw is called a 


Stop and wait for the directions for Part IV! 


Parr IV. MULTIPLE-CHOICE Exercises 


Each of the statements below is answered correctly by one of the words fol- 


lowing the sentence. 
Determine which of the choice of words correctly answers the statement, and 


write the number of that word on the line at the end of the exercise. The sample 


exercise is answered correctly. 


Sample: Sandpaper is made up of paper, glue, and 
2 


(1) brick dust, (2) sand, (3) emery, (4) gravel. Ad 
Sand is the correct answer, and the number of the word (2) is written on the 


line. 
You are allowed 7 minutes for Part IV. 


Wait for the signal! 


144 CONSTRUCTION AND USE OF INFORMAL SHOP TESTS 


Exercises 


1. Shellac is thinned with 
(1) benzine, (2) turpentine, (3) water, (4) alcohol. 
Oil stain is mixed with 
(1) alcohol, (2) turpentine, (3) water, (4) linseed oil. 
3. Red cedar is the best wood for 
(1) dressers, (2) lamps, (3) chests, (4) bookracks. 
4, Joinery is used in 
(1) plastering, (2) plumbing, (3) cabinet making, (4) bricklaying. 
5. A good liquid to rub on tools to prevent rusting is 
(1) kerosene, (2) water, (3) machine oil, (4) turpentine. 
6. A working drawing of an object shows the 
(1) corner view, (2) top view, (3) rear view, (4) bottom view. 
7. Brushes that have been used in shellac should be cleaned in 
(1) oil, (2) turpentine, (3) alcohol, (4) gasoline. 
8. Varnish brushes should be cleaned in ———— 
(1) linseed oil, (2) shellac, (3) turpentine, (4) water, 
9. Glued joints are commonly strengthened with 
(1) dowels, (2) rivets, (3) wire, (4) brads. 


ы 


End of the test 


The foregoing objective examination is designed to function as a 
paper-and-pencil test for measuring informational aspects of instruc- 
tion in woodworking. Tests of this type can be developed by the in- 
dustrial education teacher who will follow the principles outlined in 
this volume. The objectives of the course of study must be definitely 
identified. The rest of the process is largely the mechanical formula- 
tion of the selected items in suitable objective form. Such tests of 
information in industrial education are valuable as partial measures 
of achievement and teaching success, but alone, they are insufficient, 
They should be supplemented by performance tests, 


IL CONSTRUCTION OF OBJECTIVE PERFORMANCE TESTS 


The testing of performance is not new to industrial arts and 
industrial education. Manipulative trade tests were devised dur- 
ing the war and have been used with varying degrees of satis- 
faction in industry. The reliability of many of the early manipu- 
lative tests was low, and efforts to measure manipulative skill have 
not been as successful as the measurement of information by the use 
of the objective pencil-and-paper tests, A part of this difficulty has 
arisen from trying to apply pencil-and-paper techniques of test con- 


struction to manipulative-test construction without the necessary 
modifications in the administrative procedure. 


STEPS IN PREPARING PERFORMANCE TESTS 145 


83. Steps in Preparing Performance Tests. 

In evaluating the available performance tests and in constructing 
performance tests in the general shop the authors have found the 
following steps worthy of careful consideration: 

1. Analyze the course of study to determine exactly what qualities 
may be tested. 

2. Decide what tools and materials will be necessary. 

3. Prepare a number of test exercises or make a composite exer- 
cise that will offer the pupil an opportunity to provide an adequate 
sample of his work with each tool or instrument and type of material 
which it is desired, to test. 

4. Make a statement of procedure which tells the pupil exactly 
what to do in a vocabulary which is comprehensible at his grade 
level. 

5. Prepare a set of general directions for the pupil before the test 
is administered. 

6. Prepare directions for the examiner. 

7. Devise a method of scoring the test which provides an adequate 
measure of the results of each tool or instrument. 

8. Try out the test on a few students, and make the more obvious 
corrections. 

9. Make two or more forms of the test. 

10. Try out the test, and compute the reliability coefficient, stand- 
ard deviation, probable error, ete. 


For the purpose of illustrating the applieation of the principles of 
performance-test construction let us consider the measurement of the 
results of the following tool operations from a beginning woodworking 
course. 

1. Planing: side, end grain. 

. Sawing: ripping, cross-cutting to a line. 

. Boring: perpendicular to a surface. 

. Squaring: a line around a block. " 

. Measuring: to 16 inch with try-square and rule. 
6. Gauge a line parallel to a surface. 


oP Wh 


It should be kept clearly in mind that this is not a test of technique 
or speed but a test of quality or accuracy. The question is, how accu- 
rately can a pupil modify materials with these tools regardless of the 
method of handling them or the time required. 

The following tools are required: jack plane, try-square 6 inches, 
pencil, 24-inch folding rule, back saw, ripsaw, brace, l4-inch bit, 


146 | CONSTRUCTION AND USE OF INFORMAL SHOP TESTS 


bench with vise, bench hook, and marking gauge. The tools must be 
in first-class condition. | 

The following materials are required: first-quality white pine free 
of knots, 1 inch thick, surfaced on two sides, and ends sawed at an 
angle. It is most desirable to have the materials used in a test of this 
type of uniform quality and the pieces of stock used by the pupils of 
about the same size and shape. 

A composite form of test exercise was selected for this test. 
Fig. 16 shows the working drawing for Form A, and Fig. 17 shows the 
working drawing for Form B. It will be observed that the main dif- 
ferences between the test exercises are in the dimensions; otherwise, 
there is the same opportunity for modifying wood with common tools. 


—2ң"—1 


_— Bore М" Hole 


"m 


Į ја 
Form "B" 


Fia. 17. 


The validity of a performance test 
the pupil's work and enough of the sample to give an adequate meas- 
ure of accuracy. Composite performance exercises are easier for the 
pupil to visualize if they represent some familiar objeet or toy, but 


it is seldom possible to do this without throwing the sampling of the 
various tool operations out of balance. 


depends on providing samples of 


WOODWORKING PERFORMANCE TEST ОР Accuracy 
Forms A AND B 


Directions to Examiner: It is essential that the pupil shall understand the 


exact procedure, and that he be able to visualize how the block is to look when 
finished. The following directions are recommended: 


1. Read aloud and distinctly the directions to the pupil while the class follows 
silently. Answer any questions about the directions at this point. 


2. Show the pupils a completed test block, and if they care to, let them 
examine it. 


3. When there are no furth 
block. Begin work.” 

4. During the examination answer 
cedure by rereading the step in questi 


5. Observe the pupils as they work 
steps in the correct order, 


er questions, say, “Get ready. Hold up the test 


any questions about the ste 
on with the pupil, 


to make certain that they are doing all the 


ps in the pro- 


SCORING PERFORMANCE TESTS 147 


6. Make certain that the proper tool is used where indicated, but do not 
tell the pupil how to use the tool. 

7. Help any pupil having difficulty in interpreting the working drawing, but 
do not make any measurements on the test block for the pupil. This test 
measures ability to measure to И in. with a ruler, but is not a measure of 
ability to read drawings. 

8. Take in the test block when the pupil has finished. The time is not im- 
portant, for this is a test of quality or accuracy as it applies to modifying wood 
with simple hand tools. 

Directions to Pupil. This is a test to determine how accurately you can use 
woodworking tools. The wood and all necessary tools will be given to you. 
The surfaces of the block of wood are numbered 1, 2, 3, 4, 5, 6. You will be 
given specifie directions for doing the job and a working drawing that gives 
all the necessary dimensions. Do this project as accurately as you can. Do 
not waste time, but do not work too fast to do your best work. The steps 
must be done in the order given. After you begin work do not ask unnecessary 
questions, but if you are in doubt about a step in the procedure or a dimension 
on the working drawing ask the examiner. Write your name and grade in school 
on surface No. 6 of the test block. Do not begin work until the examiner gives 
the signal, 


Woopworkine PERFORMANCE Test—Form A 
Parr I 
Procedure :3 


1. Select face No. 1. Plane it square and true and to the thickness indicated 
on the working drawing (Fig. 16). When finished re-mark No. 1. 

2. Select side No.2. Plane it square and true to surface No. 1. When finished 
re-mark No. 2. 

3. Select end No. 3. Plane it square and true to No. 1 and 2. When finished 
re-mark No. 3. 

4. Measure from end No. 3 toward end No. 5, and square a sharp pencil line 
across the block to the length indicated in the working drawing. Saw off the 
waste material with a back saw so that the stock will be as nearly the required 
length as you can make it. Do not plane. Re-mark end No. 5. 

5. From edge No. 2 gauge a line the length of the block, allowing the exact 
width as indieated on the working drawing. Rip as near the exact width as pos- 
Sible. Do not planc. 

6. On surface No. 1 lay out the center for the hole and bore. 

7. When you have finished take your block to the examiner. 


84. Scoring Performance Tests. 

A try-square, a 14-inch dowel 3 inches long, and a scale measuring 
in sixty-fourths are the tools used in scoring. The test is scored in 
units of sixty-fourths of an inch. If a measurement is more than 194 
off, it is given a score of zero. If it is 164 off, it is given 9 points, 544, 
5 points, etc., as shown in Table 27. 


3 Procedures for Forms A and B are identical although details are different. 


148 CONSTRUCTION AND USE OF INFORMAL SHOP TESTS 


This test has been used a number of times by the authors in shop 
measurement. The correlation of Form A with Form B on 30 cases 
shows a reliability coefficient as high as .90. As has been pointed out, 

this test is not concerned with speed and tech- 


TABLE 27 nique, but only with the accuracy or quality of 
— work. It is not difficult to construct valid and 
Limits of Point reliable manipulative tests if the test worker 
Tolerance Scores keeps clearly in mind the different, measurable 

factors in industrial education. However, it is 
0/64 10 — well to remember that it requires skill to ad- 
Ls а minister а manipulative test. Тһе examiner 
3/64 7 must follow the technique carefully. A fine tool 
4/64 6 in the hands of the unskilled worker will not 
5/64 5 necessarily produce an acceptable result. 
6/64 4 
7/64 3 SUMMARY 
in а The outline of ће curricular content of the 


unit affords the basis for validating the content 


of the test. With these objectives and the de- 
tails of the informational content of the course as a background, a 


paper-and-pencil test is easily formulated by using the types of test- 
ing techniques discussed in Chapter X. 

Knowledge of the informational aspects of courses in industrial 
education is only one of the important outcomes. Actual production 
in the shop or laboratory is quite another important, outcome, which, 
within certain limits, is measurable. Quality in shop work may be 
rated by inspection, by actual measurement of dimensions, and by 
judgments based on quality scales, Inspection, if based on personal 
judgments unsupported by objective criteria of quality, is highly in- 
accurate. Actual measurement of the physical qualities (dimensions, 
etc.) is probably the most objective. However, there are certain other 
qualities in shop products which are not mere matters of dimensions 
or accuracy of measurement or tool work. The evaluation of such 
qualities require the use of a rating scale. Such quality scales are 
not merely useful for the measurement of accomplishment, but they 


are particularly helpful in the development of an appreciation of 
quality on the part of the pupil. 


SUMMARY EXERCISES FOR DISCUSSION 


1. List and illustrate the constructive s 
tion test in an industrial subject. 


2. Following the general plan presented in the section in this chapter entitled 


teps in validating an objective informa- 


SELECTED REFERENCES 149 


Course of Study Outline in Woodwork, prepare a similar list of specific 
objectives for some other phase of industrial education. 

3. What new suggestions can you add to the list of specific hints on securing 
reliability in an examination? 

4. Prepare at least ten objective test items of each of the four types illustrated 
on pages 140 to 144 of this chapter, using any industrial education subject- 
matter except woodworking. 

5. Prepare a performance test in metal working, printing, or mechanical drawing 
following the steps outlined in this chapter. 


SELECTED REFERENCES 


GannErr, Henry E., Statistics in Psychology and Education. New York: Long- 
mans, Green and Company, 1926. 

KELLEY, Truman L, Statistical Method. New York: The Macmillan Company, 
1923. 

Oven, C. W., Traditional Examinations and New-Type Tests. New York: The 
Century Company, 1928. 

Paterson, ELLIOTT, ANDERSON, et al., Minnesota Mechanical Ability Tests. The 
University of Minnesota Press, 1930, pp. 188-202. 

Косн, G. М., The Objective, or New-Type Examination. Chicago: Scott, Fores- 
man ini Company, 1929. 

Rucu, б. M, and Rice, С. A, Specimen Objective Examinations. Chicago: 
Scott, Foresman and Company, 1930. 

Symrostum ох Testine, Epsilon Рі Tau Review, Vol. II. The Ohio State Uni- 
versity, Columbus, 1931. 


CHAPTER XII 


CONSTRUCTION AND USE OF SCALES FOR RATING 
INDUSTRIAL EDUCATION PROJECTS 


I PROJECT RATING SCALES 


85. Need for Scales for Rating Industrial Education Projects. 

The rating of shop projects and drawings is a difficult measurement 
problem which is faced by practically all industrial education teachers. 
The assignment of an objective rating to a shop product such as а 
chair, а radio, a table, a lamp, a funnel, a dustpan, a cement bird- 
bath, or an isometric drawing calls for the keenest of discrimination 
and for some method of objectifying standards of quality. Such 
produets of the shop are composed of many different parts, which, 
combined, reflect quality in the object. For example, in the funnel, 
there are such factors as the forming, turning, wiring, seaming, and 
soldering. All these operations must be well done if the funnel is to 
have quality. Moreover, it must be made of the proper size and of 
suitable material. Thus size, shape, quality of material, suitability 
of material, and quality of workmanship as revealed in many small 
details must be recognized and evaluated by the judge. Each addi- 
tional characteristic sets up a number of possible variations in quality. 

The difficulty of grading or rating such products appears to be re- 
lated to the degree to which they vary from the typieal. For example, 
shop and drawing projects which are well executed and those which 
are very inaccurate present a simpler marking problem than those 
which are made up of mixtures of good, bad, and indifferent. qualities. 
This may be made clear by a concrete ilustration. Let us assume 
that а number of blocks or dominoes are to be arranged in a straight 
line one inch apart with all faces parallel. If all the units are arranged 
correctly it is not difficult to-see that this is true. In a like manner, 
if a shop project is very well done, it tends to radiate perfection. 
Likewise, a very poor project is not difficult to distinguish. However, 
the most difficult problem of measurement presents itself when part of 
the dominoes are set up correctly, a few of them are down, and all 
the rest are in varying degrees of alignment. The shop project which 
shows excellent design, a poor finish, square edges, weak joints, rough 

150 


e ——————MÀÁ— 7 


CONSTRUCTING A PROJECT RATING SCALE 151 


surfaces, and carefully rounded corners is likewise more difficult to 
judge. If the design is given undue consideration, the project will be 
rated too high. If only the joints and surfaces are considered, the 
pupil may fail on the project. 


86. Constructing a Project Rating Scale. 


The chief problem confronting shop and drawing teachers in their 
rating of projects is the definition of objective standards of quality. 
Industrial arts teachers have recognized the need for better methods of 
rating shop projects. Many realize that the rating of shop and draw- 
ing problems is so subjective and unreliable that it is difficult to assign 
the proper rating to a pupil’s project. In general, the suggestions for 
improvement in the reliability of rating shop projects indicate the 
desirability of combining the judgments on the different parts of the 
projects into a complete rating, and having the projects rated by three 
or more qualified judges. 

The authors have found the following principles helpful in con- 
structing a rating scale for shop and drawing projects: 


1. Make a careful analysis of the course of study for the purpose of 
selecting the factors to put in the rating scale. In general, the items 
rated are the changes made in materials by the use of tools or instru- 
ments, fasteners, and finishes. 


Example (from eighth-grade general woodworking unit) : Utility, design, 
proportion, nailing, squareness, dimensions, screws, joints, glued joints, boring, 
sawed edges, planed edges, sanding, and finish. | | 

Example (from ninth-grade drawing): Neatness, dimensions, arrowheads, 
lines, accuracy, French curves, placement, joints, lettering, and circles. 


2. Group the factors into classes according to method of rating to 


be used. 


Example: Woodwork. 

Inspection. 

Utility. 
Design. 
Proportion. 
Finish. 

Physical measurement. 
Squareness. 
Dimensions. 

Rating scale or inspection. 
Nailing. 

Screw joints. 
Glue joints. 


152 CONSTRUCTION AND USE OF SCALES 


Boring. 
Sawed edges. 
Planed edges. 
Sanding. 
Boring. 
Wood filing. 
Example: Drawing. 
Inspection. 
Neatness. 
Placement. 
Arrowheads. 
Physical measurement. 
Circle. 
Accuracy. 
Dimensions. 
Rating scale or inspection. 
Lettering. 
Lines. 
Numbering. 


3. Put the factors into а rating scale so that each part of the proj- 
ect can be given an individual objective rating and the ratings com- 
bined. 

Inspection by a critical and observant judge may reveal many de- 
fects in quality. Checks on the physical measurements of products 
by means of the rule and the Square are very objective, Pieces of ma- 
terial can be tested for Squareness, thickness, width, and length. There 
are, however, numerous quality factors which are less tangible and are 
rated better with quality scales especially developed for the purpose. 
Splicing, lettering, soldering, and sawing are examples of such quali- 
ties. 

4. Prepare a set of directions for using the project rating scale. 

The project rating scale needs a carefully prepared set of direc- 
tions which explain in simple, direct language just how the scale is 
used. This should include the method of Tecording the ratings on the 
individual parts of the rating scale, the tools, and the quality scales 
needed, as well as a statement of the method of arriving at the com- 
plete score, A place should also be provided for recording the neces- 
sary information about the pupil and the Project, as, for example, the 
student’s name, grade in school, date of finishing project, name of 
project, name or judge, and total score. 

5. Prepare a key for transforming the distance ratings into ob- 
Jective values for use in computing the composite rating. This step 
is not necessary when the scale units are numbered on the scale. 


TYPICAL PROJECT RATING SCALES 153 


87. Typical Project Rating Scales. 

On the following pages are excerpts from a project rating scale for 
mechanieal drawing which the authors have found helpful in their 
classes and which may serve to illustrate the applieation of the prin- 
ciples discussed in this chapter.” 

I 


RATING SCALE ror MECHANICAL DRAWINGS 


Pupils name 


Drawing 
School Date 
Name of judge Score 


Directions: The information required to rate the items is obtained by in- 
spection, quality scales, and physical measurement. Each item is rated on the 
basis of 10 points, The total rating of the drawing is the sum of all the item 
ratings which apply to the drawing. 

Mark the items in the order in which they appear. 

Draw a circle around the figure on the scale to indicate your rating of the 
item. The scale is divided into ten (10) equal parts. The right side (10) indi- 
cates the highest mark; the center, average; the left, the lowest. A profile of 
the ratings may be made if desired by connecting the numbers representing the 
ratings assigned to each item which applies to the specific project. 


Example: If you think Utility under 7 The Drawing should get a mark of 
8, then draw a circle around the figure 8, thus: 
1. The Drawing. 
1. Utility 123 45 67 (89 10 
Instruments: Architect’s scale, compass. 
$ Quality Scales: The judges should have quality scales for lettering figures and 
lines each with ten or more samples of known value.? 


I. The layout. 


1. The placing on paper 12 3 4 b 6 7 38 9 10 
Is the object placed in such a manner as to permit a clear- 


cut drawing? 
2. Geometric construction 123 4 5 678 9 10 
How good is the instrument work? 
3. Trueness in meeting lines 1.23 4 5 86 7.8 9 10 
Do the lines meet truly and completely? 
4. Construction lines 
а. Accuracy 


1 2 3 4 5 6 7 8 9 10 


By how many thirty-seconds do the lines miss? 
12345678 9 10 


‘Are the lines clean and sharp cut? 
1 The latter part of this chapter is devoted to a discussion of a reliable method 


of constructing quality scales. 
2 Inspection must be used wh 


b. Quality 


en quality scales are not available. 


154 CONSTRUCTION AND USE OF SCALES 


II. The finished drawing (pencil or ink). 
1. Circles. 
a. Neatness 12345987 


2 8 9 10 
Are all lines clean-cut and uniform? 
b. Trueness 123456 78 9 10 
Are the circles truly drawn? 
с. Accuracy 1 2 3 4 5 6 7 8 9 10 


Does the diameter vary from the true dimension more than 


155 inch? 
d. Tangency 1234567 


2. Ares and curves. 


8 9 10 
Do tangent lines fit in perfectly? 
а. Neatness 1 23 4 5 6 7 8 9 10 
Are the lines clean-cut and uniform? 
b. Trueness 1234567 8 9 10 
Are the curves and arcs truly drawn? 
1234526789 10 


с. Accuracy 


Do lines vary more than Vo 


d. Tangency 1234567 


inch from given dimension? 


8 9 10 


Do tangent lines fit in perfectly? 
e. Completeness 


12345678 9 10 


Do the lines run up completely? 
3. Horizontal lines. 


а. Neatness 


123456 7 8 о 10 


Are the lines clean-cut and uniform? 
b. Accuracy 


12 3456 7 8 9 10 


Do lines vary more than 
. Trueness 


о 


lá» inch from given dimension? 
12345678 9 10 


Are all lines straight and horizontal? 
d. Completeness 


1-2 3 4 5 6 7 8 9 10 


Do the lines run short? 
4. Vertical lines, 


a. Neatness 


12345678 9 10 


Are all lines clean-cut and uniform? 


b. Accuracy 1234656 7 


Do lines vary more than 14 
€. Trueness 


8 9 10 


2 inch from given dimension? 


123456789 10 


Are all lines straight and vertical? 
d. Completeness 


12345678 о 10 


Do lines run up completely? 
5. Skew lines, 


a. Neatness 


1234567289 19 


Are the lines clean-cut and uniform? 


b. Accuracy 1234567 


Do the dimensions vary more than 14 
dimensions? 
c. Trueness 


8 9 10 
$2 inch from given 


1 2 3 4 5 6 7 8 9 10 


Are the lines straight and to the points? 


d. Completeness 


TYPICAL PROJECT RATING SCALES 


1234 56 7 8 9 10 


Do lines run up completely? 


6. Dimension lines. 


a. Placing 1284 FE TS 9 10 
Are they placed correctly and conspicuously? 
b. Quantity 123456 7 S 9 10 
Are the necessary lines in? 
c. Quality 1283 £5 6 7 8 9 10 
Are they the true lines as to accuracy? 
d. Spacing 1234590 7 8 9 10 
Are they spaced too close or too much? 
e. Correctness 123456 7 8 9 10 
Are they correctly put in? 
f. Arrowheads. 
(1) Quality 13 9485-507 атш 
Are they neat and trim? 
(2) Accuracy 12353 4 598 7 8 9 10 
Do they come up to the extension lines? 
g. Extension lines 123 45 67 8 9 10 
Do they run into the object? 
7. Dimensions. 
а. Legibility 1234567 8 9 10 
Can they be read? 
b. Correctness 123 45 6 7 8 9 10 
Are they correctly put in? 
8. Notes. 
n. Legibility 123 45 67 8 9 10 
Can they be read? 
b. Clearness 123 45 67 8 9 10 
Do they state exactly what is wanted? 
c. Lettering 
(1) Uniformity 12345 67 8 9 10 
Are the letters uniform as to height and slant? 
(2) Appearance 123 45 67 8 9 10 
How are the letters formed? 
(3) Firmness 12345078 9 10 
Are the strokes firm? 
III. Summing up the drawing. 
1. Utility 123456789 10 
Can the drawing be used? 
2. Value 12345 67 8 9 10 
Is the drawing of any aid in making the project drawn? 
3. Appearance 1234567 8 9 10 
Is the drawing executed in a neat, professional manner? 
4, Completeness 12345678 9 10 


Is the drawing complete? 


156 CONSTRUCTION AND USE OF SCALES 


88. Using Project Rating Scales. 

Not only does the project rating scale afford a more objective means 
of rating projects from the shop and drawing-room, but it is itself а 
valuable teaching device. It aids the pupil in developing a proper 
appreciation of quality in workmanship. Its use by students in the 
rating of their own projects and those of their classmates gives them 
valuable opportunities for developing habits of careful analysis and 
experience in judging quality in workmanship. 

The best teaching results are obtained with the project rating scale 
when it is used during the time the project is being made. 
in the project scale should be arranged in 
the use of the project rating scale parallel with the development of the 
project itself. The finish, utility, design, and proportion of a project, 
are better judged after the project is completed. The results of saw- 
ing, gluing, squaring, screwing, turning, forming, 
splicing, etc., are easier to judge just after they are completed and 
before they have been obscured or modified by other tool operations 
or parts of the project. Sawed or planed edges are often modified by 
filing, seraping, and sanding. The quality of these operations must be 
rated at a time when they give a true picture of the pupil's proficiency. 

The general quality of a project is determined by the sum total 
of the operations which go into its making. It is well for a pupil to 


realize this, and to use the diagnostie value of the rating scales to 
check the results of the tool operations, 


rivet, make a splice, or bore 
pupil and the instructor shou 


The items 
an order which facilitates 


soldering, riveting, 


The pupil may 
he may never be able to achieve 
can and should develop an ap- 
in workmanship. One generally 
preciation of quality is to allow 
to judge the results of his own 


preciation of what constitutes quality 
accepted method of developing an ap 
the pupil to modify materials and 


er-training institutions and super- 
s. ] í proper training is not available, 
the progressive industrial education teacher should make project rat- 
ing scales which are based on hi 


CONSTRUCTING QUALITY SCALES 157 


ing scale is more objective than the usual mark assigned by the 
teacher, but it is distinetly more meaningful when the ratings of from 
five to ten judges are pooled. When the pooled judgments of such 
a number of trained observers are used, reliability coefficients of .90 
or better are obtained. This is a satisfactory reliability of measure- 
ment and produces a rating comparable to the best objective ratings 
in other fields of instruction. 


IL QUALITY SCALES FOR SHOP PRODUCTS 


89. Constructing Quality Scales. 

Quality scales are useful measurement and teaching instruments 
which the shop teacher can easily develop and use in his course. Such 
scales are a very useful means of evaluating certain types of work 
done in the class. If made available for inspection and compari- 
son, they serve to set up standards for the students themselves to 
attain, thus aiding the pupil in developing an appreciation of quality. 
The pupils may rate their own and other pupils’ work, thus gaining 
real experience in rating and further developing a concept of what 
constitutes real quality in workmanship. 

The teacher of woodworking will find that quality scales dealing 
with such skills as sawing, boring-exit edge, fastening with screws, 
gluing, planing, sanding, and nailing are very helpful instruments to 
use in the more exact evaluation of shop products. The specimens 
constituting the scales can be mounted on suitable panels to be con- 
veniently available for inspection and use. The teacher of sheet- 
metal work needs quality scales showing the range of workmanship in 
riveting, soldered lap-seam work, wiring, locked-seam work, and turn- 
ing. This list may be reduced or expanded depending on the type of 
course being taught. 

Quality scales have been widely used in rating such school products 
as writing, lettering, and drawing. Scales of this type can be repro- 
duced and widely distributed, but quality scales dealing with pieces of 
material are not so easily duplicated. Quality scales for shop use 
‘should be made up of actual specimens of the products themselves. 
They should be available to the students so that the specimens may be 
seen and handled. Such scales may be photographed, but in use the 
picture lacks the satisfaction resulting from a scale composed of actual 
samples of the work to be rated. 

Because of the real value which quality scales have in the measure- 
ment of industrial education and in the development of the pupil's 


158 CONSTRUCTION AND USE OF SCALES 


concept of quality, a method for developing such scales is presented 
in this chapter. 


90. Steps in Making a Quality Rating Scale. 


The following steps are essential in developing a quality scale for 
shop products: 

1. Secure samples of the type of product to be used in the scale. 

2. Arrange the samples in order of merit as determined by the judg- 
ment of ten or more competent judges. 

3. Determine the percentages of judges rating a given sample as 
better than each other sample. 

4. Arrange the specimens in order from best to poorest on the basis 
of these composite judgments. 

5. Find the deviation of the percentages from the median (50 per 
cent), retaining the sign of the deviation. 

6. Calculate the scale differences between the successive s 

7. Assume a zero point, and place all s 
of values. 


8. Select eight to twelve of the samples which nearest approach 


uniform differences in quality for the quality scale, assign the propor 
quality values to them, and mount them for use in the shop. 


amples. 
amples along the linear scale 


Each of these essential steps in constructing a quality scale is dis- 


cussed and illustrated in succeeding sections of this chapter. A quality 
scale showing degrees of merit in soldering a lap seam is used for 
illustrative purposes. Fig. 19 shows the values assigned in the com- 
pleted scale. 

91. Securing Samples, 


_ Twenty to forty representative samples ranging widely in quality 
will usually be adequate for the construction of a satisfactory quality 
rating scale for use in an industrial education course. In securing 
samples for such a scale, it is essential that some of the samples be 
superior, some poor uality. Samples of average 
nore diffieult to get a proper 
renty to forty samples have 
nspection, it is obvious that there 
(Rescue дыт түе е should make а 

wh 1 . , 
add them to the list, "This practice is defensible, as it might aera 
to seeure enough Specimens show- 
range. The use of many more than forty 
ficult for the judges to rate them carefully 


SECURING SAMPLES 159 


and also adds considerably to the amount of caleulation necessary in 
the derivation of the scale. 

Twenty-two samples were used in construction of the scale in Fig. 
19. It shows the quality of soldering on a lap seam. No samples 
were added to this group. The students who made the samples varied 
widely in their ability and background. 


92. Securing Independent Estimates of Relative Quality of the 


Samples. 

The samples to be used in making the scale should be lettered or 
numbered to make them readily identifiable. Names of pupils who 
made them should not be attached, since they might influence the 
judges’ rating. After the samples are properly labeled, they should 
be given to the judges individually in random order. The judges are 
instructed to arrange them from poorest to best by comparison. The 
Judges may be shop teachers or tradesmen, but they should be per- 
sons competent to rate the quality of the samples. The number of 
judges should not be less than nine or ten. A larger number would 
be better, If ten (or multiples of ten) judges can be secured, the 
calculation of percentages is simplified. It might be permissible for 
an isolated teacher who is unable to secure the cooperation of that 
number of qualified judges to have the same judge raté the samples 
more than once. If this procedure is followed, it is advisable to allow 
a day or two between ratings to reduce the influence of memory in 
placing the samples. А 

Table 28 gives the results obtained when nine judges ranked the 

ts used in making the scale illustrated 


twenty-two soldered lap join : В 
here, Each of the twenty-two samples 18 designated by а letter. 


TABLE 28 
RANKING BY NINE JUDGES 

Hron 355 

iz gy & 6.6 7 8.930 12 13 14 15 16 17 18 19 20 21 22 
знаеа Тее аео аара уроо мат 
Judge 2 а ое G| Q| A | F| E|N clzs|Mx|v|Kklo|H|pi|rF.T 
Judge3 P|B|A|N|I|U|S|7|9 viE|R|G|K|cjo|M|H|rF|D|T 
Judge4|B| P Q| E| I| RJUJG|C N|A|L|J|H|M|O|D|V|K| T|F 
Judge 5 | B| R| P| G| Q | S| U| 1 A EI|V|L|K|M|O|F|H|N|D|C 
заве mIr|olr|E[|4|e|s[U|B|v|3 нож C верн 
Judge? [вт |R|C|A|P|U,G| 8 HIJ[|Q|E| V[L.|M|K|F|NJO|D| T 
зай a(t еа ele | r[e|8 e| 8| о E| F| P | PIE | T 
Judge о | n| 1| 3| G| 4| M| U| P| R[ G1 5|9 viLIN|E|KlO FID T|H 


160 CONSTRUCTION AND USE OF SCALES 


93. Determining Percentage Ratings of Judges. 


The ratings of the twenty-two specimens by the nine judges are 
tabulated in Table 28. The next step is the determination of the per- 
centage of judges rating a given sample as better than every other 
sample. Table 29 gives these data for the nine judges and twenty-two 
samples used in developing the quality scale for soldering lap joints. 
It will be noted that the samples are rated in alphabetical order. This 
table is to be read as follows: A is better than A no times because they 
are the same sample, but B is rated better than A nine times which 
means all the judges considered it better than A, ete. It will be noted 
that C is rated better than A four times, better than B zero times, 
ete. This table, showing the number of judges rating each sample in 
relation to every other sample, is the basis for the next step in con- 
structing the scale. 

The next step is to change the ratings to percentages. Table 30 
shows the ratings in Table 29 after they have been changed to per- 
centages. Specimen B is rated better than A by nine judges, or 100 
per cent. Specimen C is rated as better than A by four judges, or 


44.4 per cent. This procedure when completed gives the results as re- 
ported in Table 30. 


94. Determining Order of Merit of Samples. 

The rank order of the samples is determined by 
sum of the ratings for each sample in Table 29. The sample with the 
highest total rating is highest in quality, the one with the next highest 
is second, and so on. This gives useful information for construction 
of a quality scale, since it indicates the relative order as given by the 
combined judgment of nine judges, However, it does not tell the 
exact distance between each sample along a known seale. It is neces- 
sary to know the relation of the respective samples to each other be- 


fore a number of samples can be selected to represent. approximately 
equal distances on the scale, 


referring to the 


95. Determining the Scale Differences, 

The first step is to find the deviation of the percentages from the 
median (50 per cent). 'This is accomplished by subtracting from the 
median (50 per cent) the percentage values for each rating as given 
in Table 30. These percentages are then expressed as positive or nega- 
tive deviations from the median (Table 31). АП cases of 100 per cent 
and zero per cent are omitted from the table since they do not operate 
to affect the results. After the technique of developing quality scales 


is mastered, the student will find it convenient to om; i 
e. nt to omit the preparation 


TABLE 29 


RATING or SPECIMENS BY JUDGES 


(Table shows the number of judges rating specimens indicated in the horizontal lists as better than those in the vertical columns.) 


DETERMINING THE SCALE 


у 


o 


M 


m 


19 


e 


4 


a 


w 


a 


19 


19 


doas 


r- 


w 


e 


a 


t- 


с 


m 


one 


HOHH 


о 


a 


© 


ЕРЕ 


a 


a 


о 


Zo 


a 


DIFFERENCES 


"oo 


a 


e 


e On 


w 


a 


n 


a 


a 


ia 


e 


HDPE 


194 


RU ШОЙ 


Totals... 


13 


11 


22 


18 


CONSTRUCTION AND USE OF SCALES 


162 


001 гип к | 0 [8722] оос оол 

EEE 0 0 Trt] e'ss Fn 
001 001 oor) oor) 001 6 001] т, 

x 0 гїї | 6°88 se js 

0 zu | 6°88 сс |u 

0 Еее | 299 | ти F |o 

0 ги | е вл |а 

х 6'ss | оос 001| O 

8'22 UI sec | 6°88 | асг N 
6°88 gee | 299] x 001) оо с" oor] IN 
001 UU | FFF б. 299 | oor] r oot} T 

001 FH GSS 001) 001 c я 

6°88 ги ЕЯ oor} LII г 
vu 0 0 0 I 
001 S24 6°88 x H 

£'££ 0 DUI 0 5 
оо оо 001] 001 6'ss 6°в8 oor 4 

£90 |82, | oss| oor] vir eRe 6'88 a 
oot] оо oot} 001 oor 001 001 a 

2°99 |6°88 oc gee 001 ә 
o iri 0 0 0 0 0 о |ru| o 0 0 0 0 o la 

FTP |6722 o ruja 0 zzz |ess| o |v] o |ru FTF | oor Y 

A S uv 5 d о м X 1 x г І н D a a 9 я Y 


UGHLONY хүн, USLISQ NSRNIOSdSg HOV DNILVQ ѕяоалр ao зоугхаонид 


08 ATAVL 


163 


DETERMINING THE SCALE DIFFERENCES 


x 686+ ose—| 6€ | ror-| o:e— 822 8720—| 6'8£4- L'9t-| 68£—| 9+ 
x Lo-lo ex [0°6— | L'ort 6'8£—| 0 se— 08£—| 6°8ВЕ+ 8'42— 8'22— 8°2@—| 088 
x 6 8e4- 6 SEd- 

1791 х fears- | 822 sza- ese | 6`8Е—/ 6°8e-+ ко per L9r— 
07860" Ler-| x jee- | ror 6°86—| 0'8£—| 6/8£— 820—| 0 8£- s2- 6°8Е—/ 6/8£-- 
oer De e+ | x [ose 8'15—| 0'8£— 2°91-| Lott] 6°8Е—| o764- 68g i= 

ки-| (вове x све ов] [ове ee 8'20—| 84+ 
mm IT] 088+ 291+] 688--| 0°6+ | 68e- 8 6°8Е—| 0°8Е+ 8+ 
$9— ose 8° zc |o st |o e+ 088—| X |L91-| 9'c-- | 822-| 9°6+ | 6°8Е+| 8°/0—| 0864 8 25— 0 9в— 
nmm oset lo'se+lo"se+| ose оог rore) x [Suse os- 8`!&—| 6'se+| 6°8Е—| 8 22--| o S£— | 6 se+ 
96+ ose+| ове ове ос | sz- X [ове rore 6°8Е—| 0°8Е+| 6°8Е—| 2 01+ 9'e— 
ы | 9°6— | Suet] 078 | o'se+| x s'u- 6 S— 8| Lor 
0 


6 S£ 


8'45—| 688+ 9's— 6`88— oe 


A Dn i 5 a o d 0 N x T x £ T H 9 d a a о a 


(NED чад 06) хутаярү woud SNOLLVIAS([ S5VLNSONGQq 
T€ ЯТЯҮІ 


164 CONSTRUCTION AND USE OF SCALES 


The percentage values which are given in Table 31 are ERÊ 
to any table showing the fractional parts of the total area ds 
normal probability curve corresponding to designated e : ie 
base line. Such a table as Table X on page 91 of Garrett’s i m 
in Psychology and Education (Longmans, Green and e 
or Table 51 on page 219 of Thorndike’s Mental and Social 1 raro 
ments (Teachers College, Columbia University, 1916) айога s e 
necessary data. "These tables show, for example, that a sor gerd 
which was rated as better than Sample A by 444 per cent 0 f 
judges (Table 30) lies below the quality of Sample A a a 
5.6 per cent (Table 31), and is actually below the median of : 
normal distribution of such qualities a distance of —0.14 standard 
deviation unit. Specimen E was rated as better than A only Til per 
cent of the time. It therefore 18 38.9 per cent below the median inia 
represented by A. Table X in Garrett, or Table 51 in Thorndi e, 
indicates that a specimen of such quality lies 1.29 sigma units below 
the quality of Specimen A. All the sigma-unit values given in Table 
32 were obtained from these tables in a similar manner. 

Table 32 gives the standard deviation distance between each sample 


and all other specimens not at the median or at the extreme end of the 
scale. These data now make it possible to compute the scale difference 
of the samples. The formula з lor obtaining the scale differences is 
PIE E ; 
as follows: S, = cu, in which А, equals the scale separation 
of the samples in sj 
unit differene 
differences, 


Example: Differences of 
the series of twenty- 


gma units, elk — тор equals the sum of the sigma- 
es for the two Specimens, а 


ind N is the number of such 


Samples F and H, They rank third and fourth in 
two samples of sold 


ering. (See Table 29.) 


Sigma-Unit 
F H Differences 
—122 —129 б 
+ 0.76 +0.14 0.62 
122 — 0.76 0.46 
—122 —122 0 
—122 +0.76 1.98 
— 016 — 0.76 0 
Edd — 0.76 0.46 
+122 +129 
Scale differences = 72.352 = 1.414.352 == 4.9772 = 0.62 
о з 70.02, 


3 This formula and the proof for Same are given 
' General Psycholog 


by Thur in Journal of 
V, Vol. 1 1405-423, 1928. Y Thurstone in 


165 


DETERMINING THE SCALE DIFFERENCES 


(в}цәшәдпвтәр [01005 риз үезсәуү s,oxrpuzoq ш Tg әде, WOI) 


eo D zc t-j +err —|PU^ = 92^ + loz’ —lee' rt EF" +e tjer + A 

% ler’ = + —jer + б6°1—)66°1— ec l—|ee t DA" = or — pz —|2 i+ jT —| n 

x : zz 1+ cc 1+ T 

Ох [к t =i + 02° = 1= ec t-j T+ IE; -+ Uu = t" = fT’ +] 8 

Кайы er" | x pr -er t ee 1—|c6 TL —|e6 1— Poe vt D, = lc I—|ce 1+ |92 —| U 
d 

а 

о 


5 
3 
+ 


02° + 


PU + br + X jæi- 


+ lec I—|£F +661— © +7 Xx PU + 


ee Iter FOL’ HOL’ |р" +z tjt Her |р" 
Bat 92 


© 
8 
T 
E 


192° FPT FOL 


co 1 
tog Dt. E = ler- E с 


= 
E 
+ 

dom oBumo eg SE 


u o d о N N 1 x f 


1 H 9 d a a о a Y 


(ьхягу Uad 06) хутаярү яні ония SNOLLVIAS([ SV аяввянах SHONGHGAJNT VIG 


ce ATAVL 


166 CONSTRUCTION AND USE OF SCALES 


The application of the same method to the other samples results 
in the scale differences shown in Table 33. This table gives the rank 
order according to the nine judges and the scale differences between 
each sample on a comparable basis. It is known that the distance 
from T to D is 1.09 sigma units, and on the same scale the difference 
between D and F is 0.16 unit. 


TABLE 33 
Scare DIFFERENCES Between SPECIMENS 
Difference 
Samples in Units 
TD 1.09 
D-F 0.16 
F-H 0.62 
H-K 0.87 
K-O 0.25 
0-м 0.57 
M-V 049 
V-L 0.31 
L-N 0.33 
N-C 0.57 
C-E 0.48 
E-J 041 
TG 0.98 
G-S 0.29 
5-Q 0.51 
Q-U 0.66 
U-A 0.64 
A-R 0.56 
R-P 0.59 
P-I 0.88 
I-B 0.29 


96. Establishing a Point of Origin for the Scale. 


The next problem is to establish a zero point. It is permissible in 
a quality scale of this type to assume a zero point. According to the 
rating of the nine judges, sample T was rated the poorest of all the 
samples. The question then arises: Is sample T the poorest conceiv- 
able soldered lap joint? The answer is that it is very poor, but could 
be worse and still hold together. Therefore, for the purposes of this 
scale, Specimen T is assumed arbitrarily to have a value of 0.9 unit 
above zero. There probably could not be a soldered lap joint of zero 
quality since it would not hold together at all and could not be con- 
sidered a soldered joint. In assuming a zero point for a quality scale 
in industrial education, it seems advisable to select a point from 0.8 


ESTABLISHING A POINT OF ORIGIN FOR THE SCALE 167 


to l above zero. After all, such a zero point or point of origin for 
the seale is arbitrarily set up, whatever the method used. 

With the zero point assumed as 0.9 unit, the twenty-two samples 
are then ranked along the scale by adding the scale differences to the 
assumed zero according to the relative rankings on the scale. Table 
33 shows the sigma-unit differences in quality of each adjoining pair 
of the twenty-two samples. Table 34 shows the scale values assigned 
to each specimen. Sample Т, being the specimen of poorest quality, is 
given a scale value of 0.9, on the assumption that it has a quality 
value of approximately 0.9 unit above the point designated as the 
arbitrary zero point of the scale. Sample D is 1.09 sigma units better 
than Sample T; therefore the scale value assigned to Sample D is 
1.09 + 0.90 = 1.99 units above zero quality. Each exercise in ascend- 
ing order of merit is assigned a scale value corresponding to its merit 
in relation to the next poorer specimen. The values in the parentheses 
in Table 34 are sigma units of difference between the pairs of speci- 
mens, The ascending values are the scale units of value or merit 
assigned to each of the twenty-two specimens comprising the scale. 
A graphic presentation of the relationship of these specimens to the 
scale and to each other is given in Fig. 18. 

After the samples are ranked and scaled, the final step is to select 
samples suitable for use in the quality scale. In doing this, the au- 


T оғ н ко мун СЕЈ GS Q U AR P 1B 


9 1 2 3 4 5 6 7 8 9 10 iux 3 


Fic. 18—Specimens Assigned to a Linear Seale (Zero assumed to be 9 step 
below sample sq. 


ht to twelve samples make a satisfac- 
in industrial education. T he first ob- 
e scale whose quality values represent 
approximately equal distances along the seale. This gives the samples 
a definite rating. The scale can then be used as a measuring instru- 
ment for rating quality. Fig. 19 shows a scale with eight samples and 
another with thirteen samples. Both of these scales are taken from 


thors have found that from eig 
tory scale for checking quality 
ject is to select samples for th 


Scale 1 


T p н kK M LC J с 9 A P B 
9 20 28 36 44 53 6l 70 80 89 101 из 

Scale 2 
т Е 0 L J Q R B 
9 2.1 39 53 70 89 107 124 


Fira. 19— Two Quality Scales Selected from the 22 Specimens. 


168 CONSTRUCTION AND USE OF SCALES 


the combined data in Fig. 18 (Table 34). It frequently occurs that 
several samples will fall very close together on the scale, as D and F 
in Fig. 18. This means that two samples of approximately equal 
quality are available for that point on the Scale. This also explains 
the unnecessary computation involved when a large number of samples 


TABLE 34 
SCALE VALUES 


Zero assumed to be 0.9 step below sample T. 


T 090 E (048) 

(1.09) 6.64 S 
D 1995 

J (041) 
705 S 

F (0.10) G (0.98) 
215 8 803 S 

H (0.62) S (029) 
277 S 832 S 

K (087) Q (051) 
3.64 5 883 S 

О (025) U (0.66) 
3.89 5 949 5 

M (057) A (0.64) 
446 S 1013 S 

V (049) Е (0.56) 
495 8 10.69 S 

L (031) P (059) 
526 8 1128 8 

N (033) I (088) 
559 S 1216 8 

C (057) B (029) 
616 S 1245 § 


of approximately the same difficulty are used, because Several samples 
fall at about the same place on the scale and only one or two are 
needed for that point to make the quality scale, 

Fig. 20 shows photographic reproductions of scales 
end-splices, underwriter’s knots, solder joints, dados, 
round-head screws. 


of quality for 
and flat- and 


97. Reliability of Ratings on Quality Scales, 


The question may now be asked, are quality scales reliable measur- 
instruments? Both the reliability of the pooled Judgments of ex- 
в and the reliability of ratings on quality scales have been shown 


ing 
pert 


RELIABILITY OF RATINGS ON QUALITY SCALES 169 


am 


H BH H 


UNDERWRITERS KNOTS 


d 


T ctas 
m mi 


SOLDERED LAP JOI 


rson, and others * in developing a 
e workers found the pooled judg- 
a reliability of .90 or better. 
ity seales to be high when the 


to be high by Paterson, Elliott, Ande 
criterion of mechanical ability. Thes 
ments of ten qualified judges to have 
They also found the reliability of qual 
ratings of two or more judges are averaged. 
4 Paterson, Elliott, Anderson, and others, Minnesota Mechanical Ability Tests, 
he University of Minnesota Press, Minneapolis, Minnesota, pp. 194-202. 


170 CONSTRUCTION AND USE OF SCALES 


The authors have found that the reliability of rating of shop 
products by means of quality seales is much higher than that normally 
obtained under subjective conditions. However, it is advisable to 
average three or four independent ratings if a reliability of .90 or 
better is desired. These ratings may be made by three or four quali- 
fied teachers or by the same teacher at widely separated intervals. 


SUMMARY 


The rating of shop and drawing products involves problems which 
confront practically all industrial education teachers. The subjective 
ratings of shop and drawing teachers are as unreliable as teachers' 
marks in other subject-matter fields. The rating of a shop project 
or a drawing requires the making of a complex judgment involving 
many variables. The reliability of industrial education teachers’ 
marks can be improved by using a project rating scale. 

Shop and drawing projects are rated by inspection, physical 
measurement, and quality scales. A project rating scale lists the vari- 
able characteristics of the project upon which the judgment is based 
so that they may be considered individually, and the total rating is the 
sum of the individual ratings. The project rating scale is also a 
valuable teaching device for developing appreciation of those factors 
which produce quality in workmanship and in diagnosing individual 
pupil difficulties. 

The development of a quality scale involves th 
resentative specimens of varying merit, the 
samples in order of merit by means of the us 
the calculation of the differences in qu 
of sigma units, the establishment of a zero point, and the arrangement, 
and evaluation of the specimens along the scale in relation to one an- 
other and to the point of origin of the scale. The technique of pooled 
judgments is probably sufficiently reliable for most purposes when as 
many as ten or more qualified and critical Judges are used. Quality 
scales are reasonably reliable measuring instruments, but reach their 


highest efficiency when the ratings of three or more judges are 
averaged. 


e collection of rep- 
arrangement of these 
е of pooled judgments, 
ality of the specimens in terms 


SUMMARY EXERCISES FOR DISCUSSION 


1. Point out the distinguishing features of 
scales for shop products. 

2. Make a project rating scale for some i 
ical drawing. The mechanical draw; 
may be followed as an example. 

3. Recapitulate the Steps in preparing a quality rating scale. 


project rating scales and quality rating 


industrial art field other than mechan- 
ing rating scale on pages 153 to 155 


SELECTED REFERENCES 171 


4. In securing specimens for the construction of a quality rating scale, why is 
it essential that a wide range of quality be sampled? 
5. What importance do you see in the establishment of the zero point of a 


quality scale? 
SELECTED REFERENCES 


Ericson, Емамов E. “Grading Shopwork,” Industrial Education Magazine, 


Vol. 28: 227-228, January, 1927. 

Hacer, Cann J., “A Systematic Method of Grading Shop Projects, 
Arts Magazine, Vol. 18: 376-378, October, 1929. 

KELLY, Truman L., Statistical Method. New York: The Macmillan Company, 


1923. 


MANGER, Emerson W., 


Vol. 24: 116-117, October, 1922. | И 
PATERSON, ELLIOTT, ANDERSON, el al., Minnesota Mechanical Ability Tests. The 


University of Minnesota Press, 1930, pp. 188-202. 
THURSTONE, L. Е., Journal of Gencral Psychology, Vol. 1: 405-423, 1928. 


” Industrial 


“Grading Drawings,” Industrial Education Magazine, 


CHAPTER XIII 


RATING AND DEVELOPING PERSONALITY AND 
CHARACTER TRAITS 


98. Importance of Character and Personality Traits. 


The very commonplaceness of the terms character and personality 
may account for the fact that they have been given no very exact Чей- 
nition. In a vague way we know what character is, and in a similarly 
vague way we speak of personality, yet probably no other single out- 
come of our social or educational programs is so important as the 
development of proper character and personality traits. Every 
serious-minded teacher, parent, or pupil realizes the importance of a 
good character and a pleasing personality. But what is character? 
What is personality? Can they be developed, or are they inherent? 
Are we mere slaves of heredity doomed to act and to think in the same 
way in similar environmental conditions, or is our destiny to a certain 
extent in our own control? 

Character has been defined in a discussion of psychology for 
teachers as “the sum total of his [an individual's] behavior in rela- 
tion to the world about him and to his fellow beings." 1 Thus, 
character is taken to be synonymous with the behavior aspects of per- 
sonality. Character is revealed by an individual's responses to stim- 
ulation. If the responses are those commonly considered pleasing or 
acceptable to his fellows, the individual is described as having a 
“good” character. If the reverse, he has a “weak” character, 

Lay usage has tended to attach to personality the notion of in- 
dividuality or distinctiveness in character. In many respects this has 
had an unfortunate effect, for in their efforts to achieve distinctive- 
ness many individuals have revealed phases of character which are 
far from pleasing. Individuality is undoubtedly important but not so 
important that it should be achieved at the expense of honor, honesty, 


1 Benson, C. E.; Lough, J. E.; Skinner, C. E.; West, 
Teachers, Ginn and Company, Boston, 1926, 


172 


P. V., Psychology for 


IMPORTANCE OF CHARACTER AND PERSONALITY TRAITS 173 


morality, bravery, modesty, or other attributes of character, which, in 
the past at least, have been considered desirable. 

Much damage has been done by self-styled "psychologists" who 
have filled the literature of today with high-sounding suggestions by 
which an individual is to find his latent powers and suddenly develop 
a strong and forceful personality. Valentine? states that a "little 
exaltation will hurt no one. 16 is a healthful sort of feeling, but is no 
substitute for intelligence, vocational ability, moral habits, leader- 
ship, culture, social habits, or any other desirable quality which in- 
heritance and creative experience alone can supply.” The school, the 
shop, and the home face a most difficult and critical problem in pro- 
viding the right kind of stimuli for the development of desirable char- 
acter and personality traits. 

Personality can be modified and improved through diligent effort 
over a period of time, but it is not a simple task which can be accom- 
plished in a few weeks or months. Personality needs always to be 
modified and developed in the light of changing social and vocational 
conditions, Obviously, since our personalities have resulted from the 
modification of our original natures by our environment, we can con- 
sciously develop more desirable modes of conduct by selecting the 
type of adjustments and then developing habits accordingly. It is 
true that too often personality emerges with little conscious knowl- 
edge on the part of the pupil of what the desirable traits are which 
have been found essential to success in life. However, the pupil and 
teacher could at least in part direct the development of personality if 
they have in mind an ideal toward which to work. e 

Jones? states that “In general, personality is the ваше as indi- 
viduality: it is that group of qualities and characteristics that makes 
one an individual, that set him off from other individuals. As such, 
it is the sum total of abilities, skills, interests, and physical and 
mental characteristics that he possesses, or better still the combina- 


tion of all of these.” 


As an illustration, let us consider the modern automobile. Its 


present state of development is the partial realization of an ideal of 
transportation which has been taking shape during a third of a cen- 
tury. It required much planning and testing to bring the automobile 
to its present state of efficiency. Today we use adjectives such as the 
following to describe its traits: durable, economical, safe, speedy, com- 


P. F. The Psychology of Personality, D. Appleton and Company, 


2 Valentine, 


N rk . 955. 5 
oy a iuc a Principles of Guidance, McGraw-Hill Book Company, New 


York, 1930, Chapter X, p. 148. 


174 PERSONALITY AND CHARACTER TRAITS 


fortable, and dependable. The characteristics of the automobile are 
constantly being refined so that they сап better moet the changing 
social and economic conditions of the times. As long as automobiles 
are used there will be a constant need for refinement and adaptation. 
The same is true of a human personality. It is the result of the in- 
terplay of many variable factors, and even after it is well developed 
it needs constant refinement in order to keep adjusted to changing 
environmental conditions. 


99. Measuring Personality and Character Traits, 


Devices for the measurement of personality and character traits 
have generally taken the form of rating scales rather than of tests. 
However, a number of these devices are so arranged that the individual 
records his own reactions quite objectively. In this respect they re- 
semble tests. Such instruments differ from the typical rating scale 
also in that the individual responding to the exercises is frequently 
not aware of the fact that he is being measured for any particular 
quality or trait. For example, in the Bernreuter Personality Inven- 
tory * the subject is asked to indicate his reaction to 125 questions by 
encircling one of the answers Yes, No, or ? which precedes each ques- 


tion. The following samples taken from the test itself will illustrate 

the types of exercises used: 
1. Yes No ? Does it make you uncomfortable to be “different” 
conventional? 

2. Yes No ? Do you day-dream frequently? 

5. Yes No ? Do you ever give money to beggars? 

15. Yes No ? Do you usually object when a 

you in a line of people? 
25. Yes No ? Do you study the motives of other people carefully? 
50. Yes No ? Do you usually try to avoid arguments? 


100. Yes No ? Do you prefer to be alone at times of emotional stress? 


or un- 


person steps in front of 


By the arrangement of the material in the test several different 
aspects of personality are measured at one time. According to the 
author, the scales used in the scoring of the responses to the test are 
very reliable. This may be due in part to the fact that the traits 
measured are not readily detectable from the test itself. The signifi- 
cance of the individual’s response to the questions is brought out by 
the use of four separate scales in the scoring of the ! 


г answers, For 
example, Seale B1-N is a measure of neurotie tendency, 


Persons who 


4 Bernreuter, Robert G., The Personalit 


y Inventory Stanford Universit: 
Press, Stanford University, California, 1931. í x 


MEASURING PERSONALITY AND CHARACTER TRAITS 175 


score high on this test tend to be emotionally unstable. In the case 
of the exercises used in the sample above, a person who answers ques- 
tion 1 with “Yes” scores +2 points. A “Yes” on question 2 adds 
five more points. On the other hand a “No” for question 5 deducts 
6 points. When Scale B2-S, the scale for self-sufficiency, is used on 
these same answers, however, a reply of “Yes” for question 1 gives 
a score of —4; a “Yes” on question 2 gives a score of +1; a “No” on 
question 5 gives a score of —3 points; etc. 

The remaining scales, B3-I for introversion-extroversion, and B4-D 
for dominance-submission, are applied in a similar manner. Scoring 
the individual’s response to the questions by each of these four scales 
gives rise to four sets of personality scores for each of which norms 
are available. Persons scoring high in self-sufficiency are the types 
who prefer to be alone, do not seek sympathy or encouragement, and 
tend to follow their own inclinations rather than seek the advice of 
others. Persons scoring high on the introversion-extroversion scale are 
inclined to be imaginative. Those scoring low on this scale rarely 
worry and prefer to act rather than to dream. Persons scoring high 
on the dominance-submission scale tend to dominate others in face-to- 
face situations. 

Analysis of personality such as is afforded by the Bernreuter Per- 
sonality Inventory has been used with success and considerable reli- 
ability with high-school students, college students, and adults. The 
inventory itself is self-administering, there are no time limits, and 


each person interprets the questions for himself. | | 
Another type of attempt to secure an unbiased pieture of certain 


personality traits without the subjects being completely aware of the 
traits on which he is to be measured is represented in the Loofbourow- 
Keys Personal Index. According to the statement in the manual for 
the test itself, it “is an instrument for the detection of attitudes in- 
dicative of problem-behavior. It is intended for use in group surveys 
to identify those boys whose personal and social maladjustment is 
such that they are, or are in danger of becoming, serious disciplinary 
problems.” It is standardized for use in the junior-high-school grades, 
although it has been found useful in senior-high-school and in con- 
tinuation-school groups. Brief samplings from each of the four test 
parts comprising the battery are given here for illustrative purposes. 
The total number of exercises in each test is indicated in the samples. 


5 Loofbourow, Graham C., Keys, Noel, Personal Index; Educational Test Bu- 


reau, Inc., Minneapolis, 1933. 


176 PERSONALITY AND CHARACTER TRAITS 


Test 1 


Directions: This is a test of your word knowledge. Put an X in front of each 
word you know. There are 100 words. 


—perceive 

—restore 

—grole 

—luxury 

—Trettle 

—verify 

—proportion 

—galine 

—exceed 

—patient 

Test 2 
Directions: Below are some words and phrases with some statements about 

each one. Mark an X in front of the one statement under each word or phrase 
which tells best how you feel about the thing named. Mark only one statement 
“under each one. 


(Seventeen items.) 


1. Chums: 
—It is hard to go without them. 
—You cannot always trust them. 
—They sometimes squeal on you. 
—They help you lie out of things. 

3. Teachers: 
—They work hard. 
—They know they can punish you. 
—They are not fair to you. 

= —They are kind of cranky, 

10. Policemen: 
—They have it in for the kids. 
—They are glad to help you out. 
—It is fun to fool them, 
—They are just big bluffs, - 


Р Test 3 
Directions: Read carefully and underline the one response which makes the 
best answer for you. Underline only one. 
(Twenty-one items.) 
1. Do you call another person by a nickname Almost Hardly 
he or she does not like? always Sometimes ever 
2. Do you keep right on studying when the Almost Hardly 
teacher goes out of the room? always Sometimes ever 
21. Do you speak pleasantly to all the people you Almost Hardly 
know even if you do not like them? 


always Sometimes ever 


MEASURING PERSONALITY AND CHARACTER TRAITS 177 


Test 4 


Directions: Answer every question as truthfully and honestly as you can by 
drawing a line under the right answer, as shown in the samples. 


A. Do you eat more than once а week? Yes No 
B. Would you rather have a dime than Yes No 
dollar? m 
(Eighty-nine items.) 
1. Would you like to wear expensive jewelry, rings, etc. ? Yes No 
2. Do you feel bored a good deal of the time? Yes No 
50. Are you anxious to get away from school and get а 
job? Yes No 
89. Do you know anybody who is trying to do you harm 
Yes | No 


or hurt you? 


The test is constructed in such a way that the undesirable responses 
are the ones scored. The “problem” responses were determined by 
comparing the answers given by “problem” boys with those made by 
others of the same age and intelligence. Test 1, False Vocabulary, is 
scored by allowing 1 point for each fictitious word. The possible score 
on this part is 30 points. In Test 2, Social Attitudes, three of the four 
possible answers are socially unacceptable. Each answer so marked 
counts one error. The score is the number of errors multiplied by 2. 
The possible score is 34 points. Test 3, Virtues, is scored by allowing 
1 point for each fault confessed. The score is obtained by multiply- 
ing the number of confessed faults by 2. The possible score is 42 
points. For Test 4, the Adjustment Questionnaire, the score is the 
number of “problem” responses. The possible score is 89 points. 

The sum of the four scores listed above gives the subject’s personal 
index. The highest possible index is 195 points. The author's state- 
ment of the significance of these personal indices is quoted from the 
examiner’s manual for the test: 5 “An index of 30 or less is clearly 
insignificant as regards problem behavior . . . A score of 40 or higher, 
however, strongly suggests an unwholesome trend, since such scores are 
made by three out of four reform school boys, аз compared with only 
one in five of the others. Scores of 50 to 60 are much more highly 
indicative, and occur but rarely in unselected groups. 

“By noting those boys who show high personal indexes, say 40 or 
over, principals, teachers, and counselors will have early brought to 
their attention those individuals most likely to become disciplinary 
problems and presumably in gravest need of observation and counsel.” 

The reliability of the battery is approximately .90. 


6 Op. cit. 


178 PERSONALITY AND CHARACTER TRAITS 


100. Rating Scales for Character and Personality Traits. 


In addition to these two types of more or less objective tests de- 
signed to reveal personality and character differences a number of gen- 
eral rating scales are also in common use. These are of two general 
types. Individuals are ranked according to their standing in regard 
to the specified character traits, or the character traits are rated and 
assigned a rank. The best rating scales tend to lessen the spread of 
teacher judgment and when two or three judgments are averaged are 
found to give fairly reliable estimates for an individual’s traits. 
Hollingworth,’ Shen? and Rugg? have all reported studies on the 
reliability of rating character traits. On an average the reliability of 
these ratings is about .55 but varies from .40 to .70 on the best 
rating scales depending on the traits rated. Whenever possible in using 
rating scales it is desirable to have two or three teachers rate the same 
pupil and then average the ratings. Rugg reports this to be a fairly 
satisfactory method although it is at times difficult. to get pupils 
rated by three different individuals who know the pupil equally well. 

Self-rating scales are also used for allowing pupils to rate them- 
selves. This practice has some value in calling the pupil’s attention 
to desirable traits, and in giving a better understanding of some of 
the desirable and undesirable traits. It has been found that pupils 
tend to rate themselves too high on the desirable traits, but in gen- 
eral the reliability of their ratings is about the same as results ob- 
tained on other types of rating scales. 

Industrial education teachers have 
character and personality traits. Since the scales are the best 
measures available, it seems desirable to use them as one aid in point- 
ing out and rating character and personality traits of pupils, It is 
well to bear in mind, however, that the general reliability of the best 
seales is only about .55, and the results obtained are only suggestive 


but are to be preferred to the unaided subjective judgment of the 
teacher. 


The most common use for rating sc 
pupils who are doing unsatisfactory wo 
valuable in developing character 


a definite need for measures of 


ales in teaching is to study 
rk, although the results are 
and personality traits in all pupils. 
7 Hollingworth, H. L., Jud, 
New York, 1922. 
5 Shen, E., “The Reliability Coefficient of Pe: 
cational Psychology, Vol. 16: 232-36, April, 1925. 


° Rugg, H. O, "Is the Rating of Human Character Practicable?” 
Educational Psychology, Vol. 12: 425-38, 485-501, N à 
13: 30-42, 81-93, January, February, 1922. 


ging Human Character, D, Appleton and Company, 
rsonal Ratings," Journal of Edu- 


Journal of 
ovember, December, 1921; 


*Societies, ete. 


DESIRABLE TRAITS IN INDUSTRIAL EDUCATION 179 


Seales have also been developed for rating shop teachers. Scales of 
this type can be used by the supervisory officers or for self-rating 
and analysis. Teachers and supervisors should keep in mind at all 
times in using the results of rating scales that the average or median 
of several ratings is more reliable than the rating by one individual. 

The values of trait rating are summarized as follows by Dr. 
Hughes: ?? 

1. Trait rating affords the teacher a better understanding of the individual stu- 
dent. The teacher cannot conscientiously fill out the record unless she knows. 

2. It affords a basis for the modification of school and classroom procedures. 
If these traits and attitudes are valuable in education, then the school situations 


and methods need to be adjusted to their development. 

3. It gives a better understanding of special groups, such as above-average 
and superior students who are doing poor school work, or below-average students 
who are doing superior work, which entitles them to membership in honor 


brings out the fact that teachers’ marks for 
scholastic achievement are based, to a large extent, on the student’s possession of 
desirable character traits. These data indicate that teachers should be trained to 
give marks for scholastic achievement alone, and that other marks should be 
devised for character traits and attitudes, because they are important enough to 
deserve separate consideration. 

5. Cooperation of parents in filling out trai 
will tend to bring about a better understanding 


school, resulting in better cooperation. 
6. Self-rating by the students, on the same scale upon which they are being 


rated by teachers and parents, will tend to turn the students’ attention to the im- 
portance of cultivating proper traits and attitudes. Students are inclined to at- 


tach importance to things which are being measured, recorded, and used. 
7. Justice in marking and teacher judgments will be more apt to be accorded 
all groups of students when te ave a more accurate knowl- 


achers and counselors h: 
edge of the character traits of their students than they could possibly gain by 
their own subjective judgments. 


8. Trait rating and analysis will result in more scientific counseling, because 
it will help to furnish a wider basis of knowledge and information about the 


students upon which to predicate advice. 
101. Desirable Traits in Industrial Education. 

It is generally agreed that the public school has responsibilities in 
developing and helping to establish desirable character and personality 
traits. The industrial education teacher and his co-workers in other 
instructional fields have а share in this responsibility. We believe, 
also, that they have a definite contribution to make. In the first place, 
it requires time to develop personality. Personality continues to de- 


d Personnel Research and Its Bearing on 
ducational Research, Vol. 10: 386-398, De- 


4. Follow-up of trait rating 


t-rating scales for their own children 
between the home and the 


10 Hughes, W. Hardin, “Organize 
High School Problems,” Journal of Е 
cember, 1924. 


180 PERSONALITY AND CHARACTER TRAITS 


velop as the result of an interplay between the original nature of the 
individual and the environment, regardless of the conscious attention 
given to it. Accordingly, the problem of the industrial education 
teacher who wishes to help develop desirable personalities in his pupils 
is first to find out what types of personalities adjust themselves 
best in our present complex social and economie life. In the second 
place, a desirable personality must be cultivated, and the desirable 
traits must be encouraged. The undesirable ones must be weeded 
out and their expression discouraged. However, it must also be re- 
membered that personality development is limited by the innate ca- 
pacity of the individual, and so its development may be expected to 
vary markedly under similar environmental conditions. 


102. Constructing and Using Scales for Rating Personality Traits in 
Industrial Education. : 
Although scales for rating personality traits have not been as 
widely used as tests of information, intelligence, and mechanical apti- 
tude, several usable techniques have been developed which will aid 
the teacher who desires to construet and use the trait rating scales. 
The first problem in the construction of such scales is the selection 
of the traits to be rated. "This selection may be based on the ob- 
servation and experience of the teacher, conferences with other in- 
structors, conferences with the administrative officers, talking with 
the pupils and their parents, conferences with industrial and social 
leaders in the community, and suggestions from authoritative litera- 
ture. After a rather exhaustive list has been prepared, a number, per- 
haps twenty, of the most important items should be selected for use in 
the rating scale. This selection may be accomplished through the 
use of pooled judgments of teachers and others interested in person- 
ality development. The traits selected must also conform to the gen- 
eral purpose of the course of study and be of such a nature that the 
industrial education teacher will have an opportunity to observe the 
expression of the traits in and about the school. For purposes of 
illustration let us consider the following personality traits which were 
selected by the authors after an analysis of the problem. The traits 
are listed and defined in terms of observable pupil responses. 
Self-reliance. This means that there has been developed in the stu- 
dent the habit of planning tasks carefully and thoughtfully and of 
carrying them out with only necessary assistance. The problem is 
obviously too difficult before assistance is called for by the student. 


Industry. This means a habit of careful, thoughtful work with- 
out loitering or wasting time. 


SCALES FOR RATING PERSONALITY TRAITS 181 


Readiness to assume responsibility. This means that a task 
though difficult should not be avoided if worth doing, and when once 
undertaken should be carried through to completion. 

Punctuality. This means the ability to arrive on time and fit one- 
self to a program. 

Cooperation. This means a readiness to assist others when they 
need help, and to join in group undertakings. 

Consideration of others. This means a thoughtful attitude in the 
making of things easy and pleasant for others. It involves keeping 
things in order, putting tools away in good condition, and always 
doing a full share of work where others are involved. 

Cleanliness and neatness. This means the ability to keep physi- 
cally clean and neat in both work and dress. 

An optimistic viewpoint toward life. This means an appreciation 
of the joy of living and a belief that life is worth while. 

After the character traits to be rated have been selected and de- 
fined, the next step is to put them into а rating scale which will permit 
the greatest amount of objectivity in scoring. There are several ac- 
ceptable methods of accomplishing this, depending on whether the 
pupils are to be given a relative rank according to their traits, or 
Whether the character traits of individuals are to be rated and assigned 
a rank. Dr. Hughes? gives the following three procedures used in 
rating: 

Method I. Normal Distribution. In this method we apply the principle 
represented in the "normal curve of distribution." In any large number of un- 
selected cases we find a few who possess а given quality in maximum degree, and 
a correspondingly small number who possess it in minimum degree. A much 
larger number, however, possess the quality in average degree. This general prin- 
ciple holds whether we consider height, weight, strength, or any other measurable 
quality or characteristic. For a scale consisting of five equal steps, we should 
have approximately the following distribution of cases on a percentage basis: 


Lowest Inferior Medium Superior Highest 
T 24 38 24 . 4 í 
But for practical purposes we have adopted a theoretical distribution as 
follows: . З 
Lowest Inferior Medium Superior Highest —— 
10 20 40 20 10 


Assuming that the individuals who are to be rated are unselected and rep- 
resentative we should have 10 in 100 marked “highest”; 10, "lowest"; 40, 
"medium"; and 20, "inferior" and "superior" respectively. А convenient method 
Of rating such a group is to have the names on individual cards and then 
arrange these cards in five piles according to the percentage distributions required. 


11 Hughes, W. Hardin, “General Principles of Rating Trait Characteristics,” 
Educational Research Bulletin, Pasadena, Vol. 3, Nos. 5 and 6, February-March, 


1925, 


182 PERSONALITY AND CHARACTER TRAITS 


The rater should as far as possible dismiss from mind every other item of the 
scale and concentrate on the one being rated. 

The method of “normal distribution" is most usable with large and unselected 
numbers. When the number of cases is small and selected the method is defec- 
tive. For this reason another method, based on the same principle, is presented. 

Method П. The Master Scale. To use this method, proceed somewhat as 
follows: Suppose the trait for which a master scale is to be made is industry. 

1. Recall any student known to possess this trait in highest degree. Write his 
name opposite "highest" in the master scale. 

2. Now, recall any student known to possess this trait in lowest degree and 
write his name opposite “lowest” in the master scale. 

3. Then recall any student known to possess the trait in average degree, write 
his name opposite “medium.” 

This gives three definite standards for comparison. The other places in the 
scale may now be filled in with names of two students half way between 
“medium” and “highest” and half way between “medium” and “lowest,” respec- 
tively. You now have a master scale as follows: 


Master SCALE ror INDUSTRY 


Rating Person Numerical 
Value * 
Highest John Jones 180 
Superior Dick Brown ч 140 
Medium Sam Johnson 100 
Inferior Henry James 60 
Lowest Bill Smith 20 


* The numerical values here assigned re 


ий. present the half-way point in each 40 of a 200- 
oi. 5C: д 


With this master scale in hand, the teacher is 
in industry. Suppose Tom Black is to be rated. The teacher quickly decides 
whether Tom is as good as John Jones, as poor as Bill Smith, or just about like 
Sam Johnson, etc. Master scales for the other traits may he made and used in 
the same way. Е 

The advantages of this scale are that it is objective and that small num- 
bers of students can be rated without immediate reference to the “normal curve 


of distribution.” In the long run, however, the percentage distributions should 
approximate those given under Method I. 


now ready to rate her students 


103. A Useful Personality Rating Scale. 


The authors have found the following type of scale valuable in 
rating personality traits in industrial education classes. It will be 
noted that it uses a form of the graphic method with the quality units 
spaced roughly corresponding to the normal distribution curve. 


A USEFUL PERSONALITY RATING SCALE 183 


A GrapHic RATING SCALE 
ron PERSONALITY TRAITS IN INDUSTRIAL EDUCATION 
Name Date 


Instructor Rating. 


Directions: Provision is made for two or more ratings of each personality 
trait on the basis of observable pupil responses. Place a check on the line which 
in your judgment represents a true estimate of the present status of the trait 


being rated. 


Minimum AVERAGE Maximum 


SELF-RELIANCE 
П ! | | 
Does the pupil plan his work carefully and thoughtfully? 
| \ | ! 1 ! 1 
Does the pupil conduct the work with only necessary help? 
| П ! ! ! | 1 | 


Does the pupil ask for help w 


| | П 


hen the problem is too difficult? 


INDUSTRY 


| 1 1 1 i | l |] 
Is the pupil in the habit of doing careful and thoughtful work? 
| | | | І П | І 


Does he loiter or waste time in his work? 


READINESS TO ASSUME RESPONSIBILITY 


l | | П П П І І 
Is the pupil willing to undertake a worthwhile task even though it is difficult? 


1 [ 1 1 1 | | l 


Does the pupil finish all his work? 


PUNCTUALITY 
! ! \ ! ! 1 1 
Does the pupil arrive on time to classes? 
| f | П 1 ! | 1 
Does the pupil hand his work in on time? 

COOPERATION 
| | 1 } 1 ! l 
Does the pupil help others when help is needed? 
| ! | 1 1 | | 


Is the pupil active in group undertakings? 


184 PERSONALITY AND CHARACTER TRAITS 


CONSIDERATION ОЕ OTHERS 
| l | | | 1 | | 
Does the pupil have the habit of making things pleasant for his classmates? 
| | l 1 1 1 | | 
Does the pupil help keep the shop in order? 
l 1 | | 1 | 
Does the pupil put tools away in the right places? 
i | 1 | | | 


When the whole class is involved in some work does he do his share or skip 
away? 


CLEANLINESS AND NEATNESS 
| | | 1 | | 
Does the pupil wash clean? 
I 1 | П І 1 І | 
Does the pupil dress neatly and in good taste? 

| | 1 | 1 
Is the pupil neat in doing his work? 


OPTIMISTIC View or Lira 
| | | | 1 
Does ће pupil have а natural likable smile? 

| | I ! 1 | 1 
Does the pupil complain about his lot in life? 

| | ! | | ! 
Is the pupil liked by his classmates? 


To secure a total rating, score each question on the basis of the following key: 


Ж НЕ | 7 | 13 | 19 | 23 | 25 | 2 | 


Methods of rating personality traits and of ranking pupils have 
been given and illustrated, but thus far self-analysis by the individual | 
has not been discussed except to mention that it is not significantly 
higher in reliability than ratings of traits by outsiders, 
tendency for individuals to rate themselves too high on the desirable 
traits and too low on the undesirable traits, just as there is а ten- 
dency of persons rating others they know very well to be influenced by 
the “halo effect.” This refers to the tendency of the one rating char- 
acter traits to rate them all about the same, depending on the rater’s 
general opinion of the individual. 


There is a 


SELECTED REFERENCES 185 


SUMMARY 


This chapter presents a. number of suggestions for the rating of 
personality and character traits. Personality in its broader sense is 
developed in an individual by the interplay of stimuli from the en- 
vironment and the native capacities of the individual. The thesis of 
this chapter is that a desirable personality can be developed over a 
period of time through careful practice and an understanding of what 
is desirable in a well-rounded personality. Industry, cooperation, con- 
sideration for others, self-reliance, readiness to assume responsibility, 
and an optimistic view toward life are suggestive of desirable traits 
to be developed by industrial education teachers as co-workers with 
teachers in other instructional fields. 

Tests of personality traits have not proved as reliable as rating 
scales, and for this reason major emphasis is placed on methods of 
developing and using rating seales. Authorities report the average 
reliability of rating scales at around .55. The most reliable ratings 
are obtained when the ratings of three or more judges are pooled. 
Some rating scales rank individuals from highest to lowest according 
to their personality traits; others give & graphic rating of the indi- 
vidual traits. In still other types the pupils are allowed to rate them- 
selves. None of these methods is highly reliable, but they are better 


than the unaided subjective judgments of teachers. 


SUMMARY EXERCISES FOR DISCUSSION | 

n the terms character and personality. 

esponsibility of the industrial arts teacher for 
the development of desirable personality and character traits? 

3. Secure from your instructor а copy of the Bernreuter Personality Inventory, 
administer the exercises to yourself, and prepare à self-analysis on the basis 


of the personality qualities jdentified in this instrument. 
4. Prepare a personality rating scale including the traits which in your judgment 
(coupled with information gained from reading in the field) are essential 


to success in teaching. 


SELECTED REFERENCES 


1. Distinguish clearly betwee 
2. What, in your opinion, is the r 


Auwrort, F. H. and С. W., “Personality Traits: Their Classification and Measure- 
ments,” Journal of Abnormal and Social Psychology, Vol. 16: 6-40, 1921. 
Book, W. Е., Learning How to Study and Work Effectively. Boston: Ginn and 


Company, 1926. 
Hartson, L. D., “An Experiment with Rating Scales Based upon a Tentative 
» Society of College Teachers of Edu- 


functional Analysis of the Subjects, e 
cation, Educational Monographs, No. XIV, 1925, Studies in Education, Uni- 


versity of Chicago Press. 


186 , PERSONALITY AND CHARACTER TRAITS 


Носнез, W. Harbin, “General Principles of Rating Trait Characteristics," Edu- 
cational Research Bulletin, Pasadena, Vols. 3 and 4, February and March, 
1925. 

Jones, Arthur J., Principles of Guidance. New York: McGraw-Hill Book Com- 
pany, 1930. 

Lamp, Donato A, Increasing Human Efficiency. New York: Harper and 
Brothers, 1925. 

McKay, H. D. The Development of Personality Traits. Ph.D. thesis, University 
of Chicago, 1930. 

NEWKIRK, Louis V., "The Shop Teacher and Personality,” Industrial Education 
Magazine, Vol. 18:370-372, No. 10, October, 1929. 

Орки, C. W., Educational Measurements in High School. New York: The Cen- 
tury Company, 1930. 

VALENTINE, P. F., The Psychology of Personality. New York: D. Appleton and 
Company, 1927. 


RATING SCALES AND Tests 


CORNELL, E. L.; Coxe, W. W.; and ORLEANS, J. S., Rating Scale for School Habits. 
Yonkers, New York; World Book Company, 1927. 

Downey, June, The Will-Temperament and Ils Testing. Yonkers, New York: 
World Book Company, 1923. 

Dracoo, Arva W., A Rating Scale for Shop Teachers. 
Public School Publishing Company. 

Hucues, W. HARDIN, “A Rating Scale for Individual Capacities, Attitudes, and 
Interests,” The Journal of Educational Method, Vol. 3:56-65, October, 1923. 

McCuuskxy, F. D., and Dorcus, Е. W., Study Outline Test. 
Public School Publishing Company, 1926. 


Morris, Eizapern H. Trait Indez L. Bloomington, Illinois: Public School Pub- 
lishing Company, 1929. 


Bloomington, Illinois: 


Bloomington, Illinois: 


CHAPTER XIV 
SUMMARIZING THE RESULTS OF TESTING 


Experience in the observation of the work of individual students 
and in the use of tests in the classroom leads to the conclusion that 
wide differences in pupil accomplishment may be expected. This 
means that scores representing objective measures of achievement in 
the elassroom will vary widely. Since the human mind is not able 
to grasp and hold numerous unlike facts in isolation, it is apparent 
that some fairly simple and accurate methods of summarizing and 
deseribing such widely varying results are necessary. This process 
of summarizing, analyzing, and compressing data so that they may be 
given adequate description is the application of statistical methods. 

Six statistical techniques are very useful in the analysis and in- 


terpretation of educational test results. They are: (1) the classifica- 


tion and tabulation of data; (2) the computation and interpretation 
the computation 


of the common measures of central tendency; (3) 
and interpretation of the more common measures of variability; (4) 
the expression of the extent and nature of the interrelations of measur- 
able factors; (5) the derivation and use of standards, norms, and 
various derived scores for the purposes of comparison and interpreta- 
tion of test results; and (6) the use of simple and effective graphie 
procedures for the presentation of facts. This chapter summarizes 


the discussion of these six points. 
1. THE TABULATION OF TEST SCORES 


104. Need for Grouping of Data. 
The physical, mental, moral, and social unlikenesses of people 


make it impossible to describe them in few words. If all men were 
of the same height it would be a simple matter to describe the height 
of men. This need for a method of grouping data for convenience in 
treating and describing the situation is illustrated in the test scores 
given in Table 35 on page 188. This table shows the scores made by 
a group of 27 eighth-grade pupils on the Newkirk-Stoddard Home 
Mechanics Test, Form А. An examination of this table shows that 


Pupils 7 and 9 made the low and the high scores on this test. The re- 
187 


188 SUMMARIZING THE RESULTS OF TESTING 


maining 25 pupils made scores between these extremes. Even with this 
small class, with scores ranging from 1 point to 15 points, it is ap- 
parent that it is difficult for the teacher to secure a clear picture of 
the achievement of the class without some treatment of the scores. 
One of the very easiest of these procedures is the arrangement of the 
test scores in order of size from the highest to the lowest. This is 
called ranking the scores. These 27 scores have been ranked in de- 
Scending order of size in Table 36. It enables the teacher to discover 
readily the highest and lowest scores, and, after some iraining and 
experience, to pick the scores which are more or less typical for the 
group. 

The arrangement of test scores in order of size is helpful only 
when the number of pupils in the class is relatively small, as 10 or 
less. When more pupils are present it usually becomes necessary to 
group the scores into the form of a table. This is called making a fre- 


quency distribution, and the table naturally is called a frequency 
table. 


105. Steps in Preparing a Frequency Table. 


The three essential steps in the grouping of 


data in a frequency 
table are as follows: 


1. The determination of the range of the scores. 

The range is the difference between the largest score 
est score in the array of series of scores. 
Table 35, the range is 13—the difference b 
and 2, the lowest score. 

2. The determination of the number 
classification. 


and the small- 
For the test scores given in 
etween 15, the highest score, 


and size of the groups in the 


TABLE 35 


Scores IN Numer or Joss Виснт MADE IN Еіснтн- 


GRADE CLASS ON 
NEwxiRK-STODDARD TEST OF Homer Mecu; 


ANICS 
Pupil Test Pupil Test T 
Number Score Number Score Ni dioe Soom 

1 6 10 10 

2 3 m 7 4 7 
3 8 12 6 21 5 
4 12 13 6 22 10 
5 3 14 9 23 9 
6 4 15 5 24 5 
7 2 16 7 25 8 
8 6 17 11 26 6 
9 1b 18 6 27 7 


П 


STEPS IN PREPARING А FREQUENCY TABLE 189 


TABLE 36 


Scores IN TABLE 35 ARRANGED IN DESCENDING ODER or SIZE 


Test Score Pupil 
15 9 
12 4 
11 17 
10 10, 22 
9 14, 23 
8 3, 19, 25 
7 11, 16, 20, 27 
6 1, 8, 12, 13, 18, 26 
5 15, 21, 24 
4 6 
3 2,5 
2 7 


intervals as they are called, in a 


frequency table depends upon the range of the scores, as well as upon 
the size of the interval which it seems best to use. No specific rule 
covering this situation can be stated. In general it seems safe to say 
that it is usually unwise to group data into fewer than 15 to 18 in- 
tervals, On the other hand, the use of 25 or more intervals may in- 
troduce an unnecessary amount of labor in making the tabulation, and 
many less than 15 may introduce a serious error of grouping. Some 
useful suggestions on the relation of the range to the size and number 
of the class intervals to use in making a frequency table are given in 


Table 37. 


The number of groups, or class 


TABLE 37 


Succesrep RELATION or RANGE or Scores AND SIZE OF CLASS INTERVALS 


Use a Class 

For a Range of Interval of 
25 or less 1 
26 to 69 3 
70 to 125 5 
126 to 175 Li 
15 


Error of Grouping. The so-called error of grouping results from 
the practice of putting into the same group, OF interval, scores which 
are very widely unlike and in which only а {езу cases are found. It 


190 SUMMARIZING THE RESULTS OF TESTING 


is greater when the number of scores is small and when they are 
found at irregular intervals along the scale. For example, the tabula- 
tion of two scores of 44 into a class interval of three units (44, 45, 
and 46) with a mid-point of 45 involves an average error of grouping 
of 1.0 point, whereas the tabulation of seven scores of 44, 44, 45, 45, 
46, 46, 46 into the same class interval introduces а smaller error. In 
placing these seven scores in the same class interval it is assumed 
that they will together have an average value approximately the same 
as the mid-point of the step (in this case 45). In this latter example, 
the error is not serious, since the average of these seven scores is 
actually 45.14 instead of 45. In tables with fewer and larger class 
intervals, it must be obvious that this error due to grouping may be 
much greater. The underlying idea in grouping scores into 15 to 25 
intervals is to organize the scores into a sufficiently small number of 
classes so that they may be thought about effectively, and yet not 


place them in so few groups that important differenees are covered 


up or significant errors of grouping are introduced. 
3. Tabulating the scores. 
The tabulation of test score 

filing of letters or cards, The v 

filing compartment into which 


S corresponds in many respects to the 
alue or size of the score determines the 


рагі it is placed. The necessity for this 
exact classification makes clear to the student why it is so important 


that the limits of the class intervals be so exactly established in the 
preparation of the steps for a table. There must be no question as 
to the intervals in which each specific score is placed, 

Undoubtedly the best way to clear up the problems of tabulation 
is to illustrate by making use of some actual test scores. In Table 35 
the scores made by 27 eighth-grade pupils on the Newkirk-Stoddard 
Test of Home Mechanics are given. These test scores cover a range 


from 1 to 15 points. The real significance of 27 scores with such a 


range can not be gathered from the form in which they are given in 


| 1 grouping or compressing these data is 
used in Table 36 which shows the number of the individual pupil 


who made each specific score. Às a matter of fact this table is turned 
into a simple frequency table by the process of setting up a scale of 
class intervals in place of the actual test scores and lis substituting 
check marks or tabulation marks for the individual pupil numbers. 
Table 37 suggests that a class interval With a step of 1 unit be used 
In arrays in which the range is 25 points or less. Accordingly, in this 
problem, class intervals of 1 with whole numbered mid-points are set 
up. Table 38 shows the results of setting up the intervals on this 
basis. The tabulation of the Scores is shown in this table in the third 


——— à —goÜÀ == 
M 


STEPS IN PREPARING А FREQUENCY TABLE 191 


column. In tabulating test scores the common practice is to make a 
vertical mark (/) for each score of a given magnitude. When the 
frequeney of any given score reaches 5, the fifth frequency is indicated 
by a diagonal mark crossing four of the vertical tabulation marks. In 
this manner, the frequencies are conveniently grouped by 5's, which 
simplifies the summation of frequencies in large populations. 

. A further illustration of the tabulation of a series of test scores 
is given in Table 39. The scores used in this case represent total 


TABLE 38 


Dara rrom TasLEs 35 AND 30 ARRANGED IN FREQUENCY DISTRIBUTION 


Class Mid- Tabulation Frequencies 
Intervals Points Marks (D 
14.5-15.5 15 # 1 
13.5-14.5 14 0 
12.5-13.5 13 0 
11.5-12.5 12 / 1, 
10.5-11.5 1 / 1 

9.5-10.5 10 // 2 
8.5- 9.5 9 // 2 
75- 85 8 /// 3 
6.5- 75 7 //// 4 
55- 65 6 7*4 / 6 
45- 55 5 IN. 3 
3.5- 45 4 i 1 
25- 3.5 3 // 2 
1.5- 25 2 / 1 


Total, or N — 27 


ade class on the Zowa Silent Read- 


comprehension scores of a ninth-gr 
The test scores of the 71 students 


ing Test, Advanced Examination.* 
comprising this class are аз follows: 104, 129, 94, 87, 118, 146, 109, 


163, 140, 125, 58, 86, 102, 103, 133, 77, 117, 114, 99, 110, 103, 93, 
123, 137, 89, 118, 117, 107, 109, 117, 114, 162, 135, 115, 101, 109, 
150, 130, 100, 109, 140, 102, 110, 148, 94, 199, 139, 115, 105, 125, 
104, 141, 127, 100, 107, 116, 136, 142, 96, 103, 111, 145, 99, 105, 108, 


98, 126, 112, 152, 114, 109. f 
The range of the scores is found by subtracting 58, the smallest 


1 Towa Silent Reading Test, Advanced, Forms A and B, World Book Com- 


pany, Yonkers, New York. 


192 SUMMARIZING THE RESULTS OF TESTING 


score, from 163, the largest score. The difference is 105. Table 37 
suggests class intervals of 5 units for such a range. Accordingly, a 
class interval large enough to accommodate a score of 163 points is 
Set up at the top of the table. Using a mid-point divisible by the 
size of the step means that the mid-point of this interval will be 165 
and that the limits of the interval will be 162.5 to 167.7 points The 


TABLE 39 
COMPREHENSION Scores on Iowa SILENT READING Test, ADVANCED 
EXAMINATION 
Class Mid- Tabulation Frequencies 
Intervals Points Marks [0] 
162.5-167.5 165 / 1 
157.5-162.5 160 / 1 
152.5-157.5 155 0 
147.5-152.5 150 Ji 3 
142.5-147.5 145 // 2 
137.5-142.5 140 KH 5 
132.5-137.5 135 97771 4 
127.5-25 130 // 2 
122.5-127.5 125 5 
117.5-122.5 120 a 3 
1125-1175 115 
107.5-112.5 110 1y n / А. 
102.5-107.5 105 THE /// 8 
97.5-102.5 100 Ы /// 8 
92.5- 97.5 95 //// 4 
87.5- 925 90 7 1 
82.5- 87.5 85 // 2 
715- 82.5 80 0 
725- 77.5 75 / 1 
67.5- 72.5 70 0 
62.5- 675 65 0 
57.5- 62.5 60 / 1 
N=71 


0 62.5, which provides for the score of 


58 at the lower end of the range of scores. Table 39 shows the entire 


ee ——— 


THE ARITHMETIC MEAN 193 


range of the class intervals, the mid-points, the tabulation marks, and 
the frequencies, based on these 71 reading-test scores.’ 


IL MEASURES OF CENTRAL TENDENCY 


rtant statistical techniques required in con- 
nection with the summary of educational test results deals with the 
computation and interpretation of the common measures of central 
tendency. This is the process of computing a single measure or term 
which may be used in describing the complete array of data in the 
table. The term central tendency arises through the fact that these 
measures are commonly found near the center of the distributions of 


scores when the scores are arranged in order of size. 
lency are commonly used in the in- 


Three measures of central tend 
terpretation of educational tests. These are: the arithmetic mean, the 
these measures are named in the 


median, and the mode. In general, 
order of their use in present-day test interpretation. As a matter of 


fact, the mode is considered to be such an unreliable measure that it 
is rarely used in educational measurements. In this discussion, con- 
sideration will be given only to the first two of the measures of central 


tendency, namely, the arithmetic mean and the median. 


106. The Arithmetic Mean. 
Almost everyone knows I 


The second of the impo 


now to find a simple arithmetic mean or 
average, as it is commonly called, by dividing the sum of a series of 
measures by the number of measures. However, not everyone knows 
that there is a rapid and reasonably accurate method of computing 
the arithmetic mean of large numbers of measures in frequency tables. 
The speed and the satisfactory accuracy with which this important 
measure of central tendency may be computed for distributions of 
large numbers of cases has made it one of the most popular and useful 
of the measures of central tendency. 

The calculation of the arithmetic mean from a frequency distribu- 
tion requires à somewhat different concept of the measure than that 
used when it is computed from ungrouped data. For the 27 test scores 
given in Table 35 the sum of the measures is 191. The arithmetic 
mean, the result of dividing 191 by 27, is 7.07. When computed in 
this way the arithmetic mean does not especially require definition. 
It is easier simply to state how it is found. When computed by the 


2 A much more detailed discussion of the problem of tabulating test scores 


will be found in Greene's Work-Book in Educational Measurements (Longmans). 
Extensive practice in tabulation of test scores is given in Problems 1, 2, 3, and 4 
of Work-Unit I of the above-mentioned Work-Book. 


194 SUMMARIZING THE RESULTS OF TESTING 


so-called shorter method, the arithmetic mean is defined as a point on 
the scale such that the sum of the deviations of the values larger 
exactly equals the sum of the deviations of the values smaller than 
itis. For those who think most clearly in conerete terms this arith- 
metic mean may be considered as the point in a beam of irregular but 
increasing thickness at which the fulerum must be placed to bring it 
into perfect balance. 

The actual computation of the arithmetic mean from a frequency 
table proceeds on the principle of the mathematical determination of 
the proper position of the fulerum of such а beam from data resulting 
from a trial balance. That is, the beam is suspended on the fulerum 
as nearly as ean be determined by estimation; then the actual amount 
that the beam is out of balance is measured. Experience shows that 
the fulerum must be moved in the direction of the heavy end of the 
beam in order to bring it into balance. The exact amount of this 
shift in position depends upon the difference in the forces bearing on 
the two ends of the beam. If there are 60 units of unbalanced force 
tending to turn a beam in a certain direction and there are 40 measures 
(scores) contributing to the distribution, it means that the fulerum 
must be moved toward the heavy end of the beam an amount equal 


to 1.5 scale units of length (60 + 40 = 1.5). This should bring the 
beam into balance, 


107. Steps in Computation of Mean. 

The specific steps to be taken in the computation of the arithmetic 
mean by the shorter method are as follows: 

Step 1. Select the mid-point of some 
ап assumed mean. Call this point zero, 
metic mean from a frequency distribution 
are all assumed to be grouped at the exact center of the step, hence 
the assumption of the mid-point of the step as the zero). 

Step 2. Mark off steps of deviation above and below this assumed 
zero point, maintaining the algebraic signs. е 

Step 8. Multiply the frequency in each step by the deviation of the 
step. Carry the algebraic signs of these deviations. Those above the 
zero step should be plus; those below it should be minus, 


Step 4. Find the algebraic sum of these deviations, keeping the sign 
of the result. 


Step 5. Divide this value by the number of ¢ 
tion, and multiply this result b 
This result is the correction (с). 


Step 6. Depending upon the sign of this correction (c), inerease or 


central step on the scale as 
(In computing the arith- 
the scores in а given step 


ases in the distribu- 
y the number of units in each step. 


THE MEDIAN 195 


decrease the value of the mid-point of the step taken as the zero by 
the amount of с. This should give the arithmetic mean. 

This procedure may be made clear by actually working the mean 
of the test scores tabulated in Table 38. The work is shown in detail 
in the accompanying table (Table 40). 


TABLE 40 


DISTRIBUTION or Scores TAKEN FROM TABLE 35 AND THE CALCULATION OF THE 


ARITHMETIC MEAN OF THE 27 Scores 


Class Intervals 1 d fd 
4.5 (15) 15.5 1 +7 7 
13.5 (14) 145 0 +6 0 
125 (13) 13.5 0 45 0 
115 (12) 125 1 +4 4 
10.5 (11) 115 1 +3 3 
9.5 (10) 10.5 2 +2 4 
85 (9) 95 2 +1 2 
75 (8) 85 3 0 (+ 20) 
65 (7) 75 4 —1 —4 
55 (0) 65 6 —2 —12 
45 (5) 55 3 —3 —9 
35 (4) 45 1 —4 —4 
25 (3) 35 2 —5 —10 
15 (2) 25 1 —6 —6 
N=27 (— 45) 
Step 1. Assumed mean = 8.0. 
Step 2. Lay off deviations. 
Step 3. Add plus and minus fd's. 
— 45 + 20 — — 25 


Step 4. Find algebraic sum of fd's. 


N. 


Step 5. ‘Divide this algebraie sum by 
— 0.926 × 1== — 0.926. 


Multiply by size of the step. 
Step 6. 8.000 
0.926 


7.074 = Arithmetic mean. 


108. The Median. 

The simplification of 
metic mean has done muc 
tation of educational test results in 
the ease with which the median may 
does not give undue weight to extreme $C 
mean, have made it à popular measure for 
ments. 


the work involved in computing the arith- 
h to stimulate its general use in the interpre- 
place of the median. However, 
be obtained, and the fact that it 
ores as does the arithmetic 
use in educational measure- 


196 SUMMARIZING THE RESULTS OF TESTING 


Some confusion has been created in the minds of students and 
teachers through a lack of consistency in the methods of computing the 
median. For many years it was common practice to instruct users of 
tests to arrange the test papers for a class with the scores in descend- 
ing order of size, and take the score on the middle paper as the score 
best representing the achievement of the class. For a long time this 
score was called the median. As a matter of fact, the measure of 
central tendency obtained in this way from ungrouped data is a crude 
median, but in order to distinguish it from the true median computed 
by data in a frequency distribution there is a tendency to call it the 
mid-measure. The mid-measure is a counting median found from 
ungrouped data. The median is always computed from grouped data. 

Computing the Mid-Measure. By definition the mid-measure is 
the score of the middle paper when the number of test papers is odd, 
and the average of the two scores nearest the middle when the num- 
ber is even, assuming that the test papers are arranged in definite 
order of magnitude. The method of computing this very simple 
measure may be illustrated by referring to the data given in Table 35. 
The 27 test scores given in this table arranged in descending order 
of size are as follows: 15, 12, 11, 10, 10, 9, 9, 8, 8, 8, 7, 7, 7, 7, 6, 6, 6, 
6,6, 6, 5, 5, 5, 4, 3, 3, 2. The mid-measure is found by counting off 
the scores until the middle paper is reached. In this case, it will be 
the score on the fourteenth paper, or 7 points. If there were only 
26 papers and the high score of 15 were missing, the average of the 
thirteenth paper from either end of the scale would be used as the 
mid-measure.* Under these conditions the mid-measure would thus 
be the average of 6 and 7, or 6.5 points, 

Computing the Median. The median is defined as a point on the 
scale such that exactly 50 per cent of the cases in the distribution are 
above it and 50 per cent of the cases are below it. 
distinguished from the mid-measure by the fact tl 
point on a seale whereas the latter is an 
(or the average of the two scores lying 
The fact that the score on the middle paper of а series is not the same 
thing as the middle point in the scale of a 


n frequency table of the same 
Scores makes it important that the two types of measures be defined 
and distinguished in use. Tt will be a movement in the direction of 


uniformity of interpretation of test results if the median is always 
understood as being computed from data in a frequency distribution. 


The median is 
nat the former is а 
actual score on a test paper 
nearest the middle paper). 


3 See Problem 1l in Gr 


eene, Work-Book in Educational 
mans), for additional drill 


4 Measurements (Long- 
on computing the mid-measure, 


THE MEDIAN 197 


In the earlier discussion of methods of tabulation, and particularly 
in the explanation of the computation of the arithmetic mean, it was 
pointed out that all the measures falling in a given step are assumed to 
have the value of the mid-point of that step. This is necessary since 
the computation of the arithmetic mean involves the correction of an 
assumed mean and this correction may take place in either a positive 
or negative direction. Now, in computing the median a very different 


assumption is made, and since this point frequently causes consider- 


able difficulty and confusion, the reasons for making it are explained 
a counting measure and is 


here in some detail. Since the median is 
obtained by counting into the distribution until a point is reached 


which throws one-half of the frequencies below it, it is necessary to 
tep are distributed uni- 


assume that all scores assigned to а given 5 
formly throughout the step. When working with the median, or 


measures of a similar character such as percentiles, all scores are 
assumed to be scattered through the step in this uniform manner. It 
may help to think of the steps or class intervals as air-tight compart- 
ments, and the scores or frequencies assigned to the steps as a volatile 
gaseous substance which expands and completely fills the compartment, 
regardless of how many or how few the frequencies may be. If four 
scores are assigned to à given step, each one of the cases represents 
one-fourth of the total area of the step. If there are 20 cases per 
step, each case is considered to represent one-twentieth of the area 


of the step in computing the median. | 
To find the median of a series of scores arranged in a frequency 


table take the following steps: 


Step 1. Divide the total number of cases in the distribution (N) 
by 2 to determine 50 per cent of the cases. (See Table 38, N/2 is 


18.5. 

is 2. Beginning at the bottom of the column of frequencies, 
count up the frequencies as far as possible without exceeding the half- 
sum (N/2). (In Table 38 the frequencies 1 + 2 + 1 + 3 + 6 equal 
13, which is still less than the half-sum, 13.5.) 

Step 8. Take the difference between the half-sum and the sub- 
total. (In this example the diference between 13.5 and 13 is 0.5 
point. This difference shows the number of cases which must be 
taken from the step in which the median is located. Since there are 
four cases in the next step, the step with the limits 6.5 to 7.5, one-half 
а ease out of these four cases must be taken. Thus the median is 
located 0.5/4 or one-eighth of the way through the step. This shows 
the proportion of the step which must be passed in counting the fre- 


198 SUMMARIZING THE RESULTS OF TESTING 


quencies in order to reach a point on the test scale such that exactly 
one-half of the cases lie below it and one-half lie above it. 

Step 4. Since the four cases in this step are assumed to be dis- 
tributed uniformly throughout the step, the fraction 0.5/4, or one- 
eighth, represents the fraction of the step which must be passed in 
order to reach the point known as the median. The fraction one- 
eighth is equivalent to the decimal 0.125. Since this value 0.125 is 
in steps and the steps in this table are one-point intervals this value 
must be multiplied by 1. This means that the median is 0.125 unit 
beyond (above) the lower limit of the step into which this fractional 
unit is taken. 

Step 5. The beginning (lower limit, since the frequencies were 
counted from the lower end of the distribution) of the step having the 
four frequencies is 6.5. Therefore the 0.125 unit must be added to the 
value 6.5. The median thus becomes 6.625. 
the decimal may be rounded to 6.63. 

Step 6. In statistical work of all kinds accuracy is extremely im- 
portant. It is therefore very desirable to check all computations. The 
calculation of the median may be conveniently checked by the simple 
process of adding the frequencies down from the top of the distribu- 
tion. In this case the interpolation would be 0.875 and would be 
subtracted from the top of the step, or from 7.5.4 


The actual work of computing the median of the silent-reading 
scores given in Table 39 is given in detail in Table 41. 


109. Uses of the Arithmetic Mean and the Median, 

The question which of these very useful measures of central ten- 
dency to use in test interpretation frequently arises, 
there is not a great deal of choice. 
the shorter methods of computing the arithmetic mean the median was 
very popular. It is simple to compute, and furthermore, is considered 
especially suitable for test interpretation because of the fact that 
widely deviating scores do not unduly influence it, On the other hand, 
the arithmetic mean is now very easily calculated, and for most ex- 
perimental purposes it appears to be quite import 
individual score given weight in the res 
its magnitude. The greatly inere 


For practical purposes 


In many respects 
Prior to the general adoption of 


ant to have each 
ults in direct proportion to 
ased use of educational tests for ex- 


*This and a number of other points in the com 
discussed and explained in connection with Illustratio: 
inclusive) in the Work-Book in Educatie 
(Longmans). Problems 12, 13, 14 
tensive drill on the finding 


putation of the median are 
ns 7, 8, and 9 (pages 35 to 38 
onal Measurements by H. A. Greene 
‚ and 15 in this Work-Book 
of medians from all types of 


also provide ex- 
distributions. 


USES OF THE ARITHMETIC MEAN AND THE MEDIAN 199 


TABLE 41 


DISTRIBUTION or Test Scores ОЕ 71 Nintu-Grave PUPILS. TOTAL COMPREHENSION 
Scores FROM Iowa SILENT READING Test; ADVANCED 


Class Intervals 
162.5-167.5 
157.5-162.5 
152.5-157.5 
147.5-152.5 
142.5-147.5 
137.5-142.5 
132.5-137.5 
127.5-132.5 
122.5-127.5 
117.5-122.5 
112.5-117.5 
107.5-112.5 
102.5-107.5 

97.5-102.5 
92.5- 97.5 
87.5- 925 
82.5- 87.5 
77.5- 825 
72.5- 77.5 
67.5- 72.5 
62.5- 67.5 
57.5- 62.5 


Step 1: n. 35.5 = half-sum. 


Step 2: 1+1+2+1+4+8+8=25. Subtotal. 


Step 3: 355—25— 10.5. 


Step 4: ni- 0.954 X 5 (size of step) = 4.77. 
Step 5: 1075 4 477 = 112.27 — median. 


Step 6: Check. 
35.5 — 35 — 0.5. 
D. a Uma 
i x 5= 0.23. 


112.5 — 0.23 = 112.27 = median. 


u lo! 
З | пооноюнњоою боо сют лоо о н н 


І 


N= 
naturally tended to increase the popu- 
larity of the arithmetic mean as a measure of central tendency. In 
general, and in the absence of any other guiding principle, use the 
median in all interpretations Or comparisons in which the median 
itself was used in securing the basis for comparison. That is, if test 
results are to be compared with test norms whieh are based upon 
medians, then the medians of the test results should be computed and 
used. Comparative norms based upon means may well be compared 
with class means. For most experimental purposes the arithmetie 
means should be used, particularly where the scores of individuals are 
compared with their own scores obtained under experimental controls. 
In most experimental studies it is desirable for all measures to receive 
consideration, and furthermore, in most such studies other measures, 
as the standard deviation, are required. Since these additional statis- 
tical measures are usually based upon the same processes as those used 
in calculating the mean, the arithmetic mean is the logical measure 


to use. 


perimental purposes has thus 


200 SUMMARIZING THE RESULTS OF TESTING 


III. MEASURES OF VARIABILITY 


The need for measures of variability in the interpretation of test 
Scores arises through the fact that two groups of pupils may earn 
Scores on a test which will have the same medians or arithmetic means 
and yet represent distinctly different types of instructional situations. 
At least two types of descriptive measures of a distribution are needed 
before all its essential features can be revealed. The measures of 
central tendency reveal the points on the scale where the typical scores 
are most likely to be found. Some method of expressing variability 
is required to reveal differences in range of talent. 

The two groups of scores presented as Class A and Class B in 
Table 42 illustrate this situation very clearly. The means of the two 


TABLE 42 
ILLUSTRATION or NEED ror Measures OF VARIABILITY 


Class A Class B 


122 98 
116 96 
108 95 
101 93 
96 90 
92 89 
89 87 


— 


86 Means 86 


——— 


83 85 
80 83 
76 82 
71 79 
64 77 
56 76 
50 74 


series of scores are identical, each bein 
for Class A is 72 (122 — 50), which is exactly three times the range 
(98 — 74 = 24) of the Class B scores. The quartile deviation com- 
puted from the ungrouped scores is 15 for Class A and 7 for Class P. 
The standard deviations of the scores are 20.16 and 7.3 for the Class 
А and Class B scores respectively. Even the most inexperienced 
teacher or student must recognize that very different ranges of ability 


are present in these two classes and that correspondingly different in- 
structional problems are presented to the teacher. 


5 86. The range of the scores 


QUARTILE DEVIATION 201 


110. The Range. 


Of the three commonly used measures of variability mentioned in 
the illustration in the previous paragraph the range is the easiest to 
find and the least useful measure. The range is the scale distance be- 
tween the lowest and the highest scores in an array. The very defini- 
tion of the range makes it apparent that it is one of the least reliable 
measures, since it is so readily affected by the fluctuation of either of 
the extreme scores. In arrays of test scores or frequency tables in 
which the scores fall into line quite regularly the range may be a 
fairly consistent measure. In the illustration given in Table 30 the 
range is almost as effective in revealing the spread of ability as the 
standard deviation, which is usually considered to be one of the most 
reliable measures of dispersion. This, without doubt, may be traced 
to the consistency with which the test scores vary above and below 
the means. It may be sufficient to point out here that the range is 
rarely used as an evidence of dispersion or variability in the inter- 
pretation of educational-test scores since such scores rarely fit into the 
Scale with the consistency shown in the illustration in Table 30. 


111. Quartile Deviation. 


The particular merit of the semi-interquartile range or quartile 
deviation (Q) as a measure of the variability of test scores lies in the 
fact that it utilizes the range of the middle half of the cases rather 
than the range of the extremes. In actual practice the range of the 
middle half of the cases is found by counting off frequencies from the 
lower end of the distribution until a point cutting off 25 per cent of 
the cases is located. The method of finding this point is identical 
With the procedure in computing the median except that only 25 per 
cent instead of 50 per cent of the cases in the distribution are con- 
sidered. This point is commonly designated as Q,. A point on the 
Scale which cuts off 25 per cent of the cases from the top of the dis- 
tribution is found in a similar way. This is known as Qs. The re- 
maining cases included between these two points are the middle 
50 per cent. The reliability of this measure lies therefore in the fact 
that it is based upon the portion of the distribution in which the 
density of the population is greatest. 

The quartile deviation (Q) is found by taking one-half of the dif- 
ference in the scale values of the points Qs апа Q,. The formula for 


this measure of variability is Q = Qs — Q, 


3 - The computation of Q 
1$ illustrated in terms of the data presented in Table 41, and since the 


202 SUMMARIZING THE RESULTS OF TESTING 


procedures are essentially the same as those used in finding the median 
the steps are summarized very briefly. 

Step 1. Find 25 per cent of the cases. In this problem one-fourth 
of the cases equals 17.75. | 

Step 2. Summate the frequencies beginning at the bottom until а 
point not in excess of 17.75 cases is reached. 1+1+2+1+4+8 
= 17. The difference between this subtotal and 17.75 is 0.75 case. 

Step 3. 0.75/8 times 5 equals 0.47 unit. 

Step 4. Add 0.47 unit to the beginning of the interval in which 
the 8 cases are found. Thus, 102.5 + 0.47 equals 102.67, which is Qu 
for this distribution. 

Step 5. Summate the frequencies beginning at the top of the dis- 
tribution. 1 + 1 + 3 + 2 + 5 + 4 = 16. The difference between N/4 
or 17.75 is 1.75 cases. 

Step 6. 1.75/2 times 5 equals 4.38 units. 

Step 7. Since this computation of Q; is proceeding from the top 
of the distribution the 4.38 units must be subtracted from the top of 
the step into which the interpolation is made. Thus, 132.5 — 4.38 = 
128.12, which is Q; for this distribution. 

Step 8. Qs — Q, or 128.12 — 102.67 equals 25.45, which is twice 
the value of the semi-interquartile range. Thus 25.45 divided by 2 
equals 12.73, or the quartile deviation for this distribution. 


The values Q and the median (Q+) are frequently confused. They 
are quite different measures, however. The median or fiftieth per- 
centile is a measure of central tendency; Q is a measure of the varia- 
tion. Q expresses the variability of an array in terms of the average 
distance from the center of the distribution it is necessary to go in 
either direction to include the middle half of the cases. 


112. Standard Deviation. 


Such simple devices as the range and quartile deviation (Q) are 
sufficiently exact for many ordinary situations involving the interpre- 


tation of test results. However, other statistical analyses demand 
more refined measures of variability. For this type of work the 
standard deviation is generally used. The standard deviation is the 
square root of the mean of the square of the deviations from the mean 
of а distribution. Expressed in symbols the standard deviation is 
ZF D? 
84] ү С in which 2FD2 equals the deviations expressed in the 
form of the sum of the produets of the fre 
deviation of each step from the assumed 
of cases in the distribution; ¢ equ 


quencies at each step by the 
mean; V equals the number 
als the correction used in computing 


MEANING OF THE STANDARD DEVIATION 203 


the arithmetic mean; and s represents the size of the class interval 
of the distribution in units. 

'The standard deviation may be computed about any common 
measure of central tendency, although in common practice it is usually 
computed about the arithmetie mean. "There is at least a theoretical 
advantage in using the mean as the point around which to determine 
the standard deviation. The arithmetic mean is the point in a dis- 
tribution about which the deviations are the least. 


113. Meaning of the Standard Deviation. 


The likenesses and differences of the quartile deviation (Q) and the 
standard deviation (о) are shown in Fig. 21. The quartile deviation, 
or semi-interquartile range, by definition takes into account the middle 


Q 
Fic. 21—Comparison of Standard Deviation and Quarter Deviation (Q). 


50 per cent of the cases. That is, the ordinates (lines erected per- 
pendieular to the base line of the curve) erected at a scale distance 
equal to Q on either side of the mean or median include 50 per cent 
of the area of the surface between the curve and the base line. In 
the diagram the lines K and L represent the ordinates erected at a 
distance equal to Q on either side of the mean. The lines M and N 
represent ordinates erected at a distance equal to e on either side of 
the mean. The standard deviation (о) takes into account approxi- 
mately 68 per cent (in a normal distribution 68.26 per cent) of the 
area of such a distribution. 

In a normal distribution the value sigma bears a definite relation- 
Ship to the curve of the distribution itself. When a normal distribu- 
lion is presented in graphic form the result is a symmetrical bell- 
shaped curve with many cases in the middle and few at the extremes, 
Certain types of these characteristic bell-shaped distributions have 
Boss Bu diva Roni uni a ee ib DAN 
if certain basic data concernin th | E ners e d rus 

g the curve are given. In these formu- 


204 SUMMARIZING THE RESULTS OF TESTING 


las sigma is опе of the values which m 


ust be given in order to con- 
struct such a curve. Sigma 


, in the typical formula, represents the 
distance from the mean at which the curve changes from convex to 
concave. In Fig. 21 the points where the curve changes its char- 
acter are indicated by the ordinates lettered M and N. 

"Thus, because of this direct mathematical relationship which the 
standard deviation bears to the curve of the distribution itself, and the 
reliable expression of variability which it provides due to the fact 
that every deviation in the distribution is considered, the standard 
deviation is one of the most useful of the measures of variability, 


114. Computing the Standard Deviation (c) from Ungrouped Data. 
In the computation of the standard deviation from ungrouped data, 
as in the accompanying illustration, the mean of the distribution must 
be found. When the data are grouped in a frequency table it is not 
strictly necessary for the arithmetic mean to be computed, although it 
is necessary to go through all but the last step of the process. 
The steps in the computation of the standard deviation from un- 


grouped data are given in detail in connection with data from 
Table 42. See Table 43. 


TABLE 43 
Data ғов CLASS A FROM TABLE 42 
Test d d? 
Scores (Deviation) (Deviations 


Squared) 


122 +36 1296 
116 +30 900 E 
108 +22 484 ga 22 
101 +15 225 Бый: 
96 +10 100 
92 +6 36 — 
89 +3 9 = 1/6100 
15 
86 0 0 
83 == 8 9 
80 — 6 36 == V40667 
76 —10 100 
71 —15 . 225 
64 —22 484 =20.17 
56 =ч 900 
50 — 36 1296 
Total 1290 Zd? — 6100 


Mean 86 


STANDARD DEVIATION FROM GROUPED DATA 205 


The mean of the scores for Class A in Table 43 is S6. Thus a 
score of 89 deviates from this mean by 3 points. A score of 96 devi- 
ates 10 points, ete. The standard deviation (о) is the mean of the 
squares of these deviates from the mean of the array of scores. It is 
necessary therefore to square each of these deviates. These are given 
under the column headed d*. Since each deviation appears only once 
and the data are ungrouped, the formula may be simplified to read 
o = V xd*/N. The sum of the deviations squared (xd?) is 6100. The 
mean of these deviations is therefore 406.67. This value is the mean 
of the deviations squared. Therefore to turn it into units of the scale 
the square root of this quantity must be taken. "This value is found 
to be 20.17, which is the standard deviation (e) of this series of 
Scores. Тһе mean of this distribution is 86. The o is 20.17. This 
means that, between scores 20.17 points larger and 20.17 points 
smaller than this mean, approximately two-thirds (68.26 per cent) 
of the scores will be found. 


115. Computing the Standard Deviation from Grouped Data. 


The method of computing the standard deviation from ungrouped 
data illustrated in Table 43 may be applied with very few changes to 
the calculation of sigma from grouped data. A slight change in the 
general formula is required, for, when the scores are grouped in class 
intervals, the deviations of the scores must be considered by groups 
having the mid-points of the steps in which they are found. This 
permits the expression of the deviations in steps rather than in units 
of the scale. The formula for use in calculating the standard devia- 
tion when the data are grouped in a frequency distribution is 


s VIE — cœ. The steps in the application of this formula in the 


caleulation of the standard deviation of the scores originally presented 
in Table 38 will make clear all the processes involved in finding the 
sigma of a frequency distribution. The computations themselves are 
Shown in Table 44. 

Step 1. Assume a mean as near as possible to the true mean of 
the distribution in order that the correction (c) may be as small as 
possible. If the correction is larger than 0.5 step it may be desirable 
to start the work over and assume a mean nearer the true mean. In 
Table 44 the step with a mid-point of 7 is taken as the zero step. 

Step 2. Lay off deviations above and below the step assumed as 
Zero, and multiply the frequencies in each step by the deviation of 
the step exactly as in the calculation of the arithmetie mean. 

Step 3. Summate plus fd's and the minus fd's algebraically. The 


206 SUMMARIZING THE RESULTS OF TESTING 


TABLE 44 


Dara FROM TABLE 38 


Class Mid- 

Intervals Points Í d 14 fd? 
14.5-15.5 15 1 +8 8 61 

13.5-14.5 14 0 +7 0 
12.5-13.5 13 0 +6 0 : 
115-12.5 12 1 +5 5 25 
10.5-11.5 11 1 +4 4 16 
9.5-10.5 10 2 +3 6 18 
8.5- 9.5 9 2 +2 4 8 
75- 85 8 3 +1 _ 3 3 

65- 75 7 4 0 +30 
55- 65 6 6 -1 ^ 6 6 
45- 55 5 3 —2 6 12 
35- 45 4 1 ME 3 9 
25- 35 3 2 wi 8 32 
15- 25 2 1 —8 5 95 
N=? — 98 218 

——— TT 
.30—238 .42 — " 
— але ш + 0.074 
c — 0.005. 
5/4? — 218 


UMP 8.037. 

8.037 — 0.005 — 8.032, 

© steps == 8.032 == 2.83. 

Step of 1 in table, therefore с = 2.83. 


sum of the fd's in this problem is + 9 units. Divide this by the num- 
ber of cases in the table (N — 27), and the resulting correction (c) 
is + 0.074. This correction is the same as the one used in computing 
the mean, 

Step 4. Square this correction in order to have it in the same de- 
nomination as the values from which it must later be subtracted. The 
square of ¢ (+ 0.074) is 0.005 in this problem. 

Step 5. Multiply the values under the column headed fd by 


the values under d. This will give a column of values known 


as fd’. Summate this column. In this problem the sum of the fd? 
is 218. 


Step 6. Divide the sum of the fd? 


by N, the number of cases in 
the distribution. The result of this divi: 


sion is 8.037. 


USING THE STANDARD DEVIATION 207 


Step 7. Since this value, 8.037, is always too large in proportion 
to the amount the true mean deviates from the assumed mean (in this 
ease, the amount represented by the value c) it must be reduced an 
amount equal to the square of c. Thus, 


8.037 — 0.005 — 8.032. 


Step 8. The sigma of this distribution expressed in steps is next 
obtained by extracting the square root of the value 8.032. The square 
root of this value to two decimal places is 2.83. Since the class inter- 
vals used in this frequency table are steps of one unit, the standard 
deviation is therefore 2.83.° 


116. Using the Standard Deviation. 

Assignment of Class Grades. The student or teacher who is inter- 
ested in the critical analysis of test scores will find the standard devia- 
tion a very useful and reliable instrument for the purpose. For ex- 
ample, it offers the basis for a practical plan for turning scores on 
objective tests into class marks. The importance of this practice is 
so great that the steps involved in the technique are given in detail. 
The computations described are based upon the objective test scores 
from a class of 45 pupils given in Table 45. The student will do well 
to check all these computations for errors. 

Step 1. Prepare a suitable frequency table of the test scores, lay 
off the deviations from the assumed mean, and find the sum of 
the fd’s and the arithmetic mean. The mean of this distribution 
is 68.55. 

Step 2. Compute the standard deviation (о) of this distribution by 
multiplying the fd's by the deviations in steps, summating the fd*'s, 
dividing this sum of the fd?'s by the number of cases, subtracting from 
this quotient the square of the c used in finding the arithmetie mean, 
extraeting the square root of this remainder, and multiplying this root 
by the size of the step used in the table. The standard deviation of 
this distribution found in this manner is 19.40. 

Step 3. Since a distance of two and one-half sigma units above 
and below the mean includes almost 99 per cent of all cases in a dis- 
tribution this number of sigma units is laid off above and below the 
mean. This naturally results in placing one of the sigma units in the 
middle of the distribution in such a way that one-half of the sigma 
distance of the middle unit extends above the mean and one-half 


* Problems 19, 20, and 21 in Greene's Work-Book in Educational Measure- 
ments (Longmans) furnish excellent supplementary practice in the computation 
of the standard deviation. 


208 SUMMARIZING THE RESULTS OF TESTING 


TABLE 45 
STANDARD Deviation TECHNIQUE FOR ASSIGNING CLASS GRADES * 
Test Mid- Class 
Scores Points Intervals 1 d Ја Ја? 
109 110 107.5-112.5 1 10 10 100 
104 105 102.5-107.5 2 9 18 162 
103 A(11196) 100 97.5-102.5 2 8 16 128 
102 95 92.5- 97.5 4 @ 28 196 
99 90 87.5- 92.5 0 6 0 0 
95 85 82.5- 875 2 5 10 50 
95 80 TI 5- 825 2 4 8 32 
94 75 725- 77.5 3 3 9 27 
93 70 67.5- 725 3 2 6 12 
84 В(17.8%) 65 62.5- 67.5 4 1 4(4- 109) 4 
83 60 57.5- 62.5 7 
79 55 525- 57.5 Т сү — 7 
79 50 47.5- 52.5 & 2 в 16 
77 45 42.5- 47.5 1 —3 —3 9 
76 40 37.5- 425 1 —4 =4 16 
76 35 325- 37.5 2 —5  —10(—32) 50 
71 N —45 77 809 
7" 
69 
64 pid 5 fd? 
64 (35.5%) AM.=60+s id SD.—s Mp e 
Es 77 809 хе 
a =60+5 7 =5 VG — (7? 
60 
60 = 60+ 5070. =5V 17.98 — 2.92. 
60 
es = 60 + 8.55. =5V 15.06. 
E — 08.55. =5 X 3.88 or 19.40. 
57 Find score limits: 
56 
a 68.55 + 4(19.40) = 7825 upper limit of C group. 
B DELS 68.55 + 13(19.40) = 97.65 upper limit of В group. 
70 
2 68.55 —1(1940) = 58.85 upper limit of D group. 
En 68.55 — 1} (19.40) = 39.45 upper limit of F group. 
e A= above 97.65. D= 39.45 to 58.85. 
г Fd(4.5%) B=78.25 to 97.65. Fd= below 39.45. 


C= 58.85 to 78.25. 


* For a complete explanation and discussion of the many problems involved in objectify- 
ing teachers marks see Bangs, C. W., and Greene, H. A., ‘Teachers’ Marks and the Marking 


System," University of Iowa Extension Bulletin No. 244, College of Education Series No. 26, 
May 15, 1930. 


USING THE STANDARD DEVIATION 209 


below. Accordingly, to the arithmetic mean of 68.55 add one-half 
of the standard deviation (one-half of 19.40). This gives a value of 
78.25, which becomes the upper limit of the group of scores which will 
be assigned grades of C. 

Step 4. The upper limit of the group of scores to be assigned B's 
is found by adding one and one-half standard deviation units to the 
arithmetic mean. Thus, 68.55 + 1.5 (19.40) = 97.65, which is the 
upper limit of the B group. 

Step 5. The upper limit 
one-half of a standard deviation uni 
(19.40) — 58.85. 

Step 6. 'The upper limit of 
ing one and one-half sigma uni 
68.55 — 1.5 (19.40) — 39.45. 

Step 7. From these values t 
be set up. Class grades may then be 
scores within the limits specified. 


of the D group is found by subtracting 
t from the mean. 68.55 — 0.5 


the Fd group is obtained by subtract- 
ts from the mean of the distribution. 


he score limits of this distribution may 
assigned as indicated to the 


GRADES Score LIMITS 
A 97.65 and above 
B 78.25 to 97.65 
C 58.85 to 7825 
D 3945 to 58.85 
Fd below 39.45 


ractically no subjective factors are 


involved in the assignment of grades by this method. The objective 
test scores of the 45 pupils used in the illustration are changed by this 
treatment into 5 A's, 8 B's, 16 C's, 14 D's, and 2 Fd's. The score 
limits are determined by the standard deviation units and would be 
the same no matter who assigned the grades. It should be noted, how- 
ever, that these limits hold only for this particular distribution and 
must not be assumed to be true for any other test. The teacher should 
also remember that this method of grading does not take into account 
the absolute level of ability at which а particular class works. The 
superior pupil in an average OF poor class receives an A by this 
method just as readily as does the superior pupil in a very superior 
class, This is probably less serious than it sounds, however, for most 
class groups large enough application of this technique 


average out quite well in this respect. 
Basis for T-Scores and Other Derived Scores. The standard devi- 


ation also affords the basis for derivation of а number of useful 


It is readily apparent that p 


to warrant the 


210 SUMMARIZING THE RESULTS OF TESTING 


derived scores in test interpretation. For example, the T-score now 
commonly used in commensurating test scores depends upon the 
standard deviation for its basic unit. For many years prior to the 
development and popularization of the T-score, test scores were ex- 
pressed in terms of their position in the total distribution. For in- 
stance, a pupil's score might be a member of a distribution having a 
mean of 60 points and standard deviation of 5 points. А score of 
50 in the test would be designated by a standard score of — 2.0 sigma 
units, since it lies exactly two standard deviation units below the 
mean of the distribution. This same procedure is used in assigning 
(m E 10. (. 50, i 
which m is the mean of the distribution, x the deviation of the score, 
and e the standard deviation of the distribution. 'The difference be- 
tween the mean and any test score is multiplied by 10 in order to 
remove all decimal points. 'The 50 points are added in order that 
there may be no negative scores. 'The T-score is a very convenient 
method of interpreting test scores. T-scores of 25, 50, and 75 mean 
that the individual pupil's seores were right at the lower quartile, the 
median, and the upper quartiles. This fact makes it easy to attach 
meaning to the test scores. 

Scaling of Test Items. 'The standard deviation, along with certain 
other measures of variability, represents a convenient unit in which 
to evaluate the difficulty of test items. When used under these con- 
ditions the standard deviation of a theoretically normal curve of the 
specified ability is used as the unit in laying off differences in diffi- 
culty along a linear scale. As a first step in the procedure, the per- 
centage of pupils failing on each item or exercise must be secured. 
By means of tables based upon the normal curve these percentages 
of failure are changed into standard deviation units which express the 
position of the exercises with respect to the mean ability of an infinite 
and normal population. Exercises which are answered successfully by 
50 per cent of the class are assigned a position at the mean. Exercises 
missed by 55 or 60 per cent of the class are given sigma values above 
the mean, ete. A significant feature of this procedure, however, is 
the fact that a difference in difficulty of 5 per cent near the mean 
results in a relatively small sigma difference, while a 5 per cent dif- 
ference near the extremes of the distribution makes a relatively large 
sigma difference. This is in conformity with the fact that because 
of the height of the curve near the mean a smaller distance along 
the linear scale on the base line is required to add a given area of 
the curve. Thus, the difference in the sigma values assigned to two 


T-scores. The formula for the T-score is T = 


THE CORRELATION COEFFICIENT 211 


test items having percentages of failure of 55 and 60 is 0.13 
standard deviation unit (2.74-2.61),° while the difference in apparent 
difficulty of two items failed by 90 and 95 per cent of an experimental 
group is 0.34 standard deviation unit (4.09-3.75). The net result 
of this method of item evaluation is to magnify somewhat the sim- 
plicity of the very easy items and the difficulty of the very hard ones. 

Sigma units are also utilized in the construction of scales for the 
estimation of the merit or quality of certain classroom products. The 
use of these units in the derivation of such scales is discussed in detail 


in Chapter XII of this book. 


IV. MEASURES OF RELATIONSHIP 


The eritical analysis and interpretation of educational test results 
often make it necessary for the teacher and research student to secure 
more exacting descriptions of the situation than are afforded by the 
simple measures of central tendency and variability. For example, 
the matter of the selection of the test itself is one which cannot be 
determined wholly on the basis of the arithmetic means and the 
standard deviations. The most useful information for this purpose 
is found by determining the relationship which exists between the 
ability to be measured and the tests or measures under consideration. 
In the construction and use of informal objective examinations there 
are occasions when it is necessary to discover exactly how accurately 
the examination measures, and how much this accuracy would be 
improved by increasing the length of the examination. This type of 
se of the method of correlation, the method 


analysis also requires the u 
ination of relationships among different 


which permits the determ 
measures of the same individuals. 


117. The Correlation Coefficient. 

In the statistical expression of relationships, as in the other meas- 
ures, it is desirable that this relation between two scries of variables 
a single objective or mathematical value. A number 
of different ways have been proposed for the derivation of these ex- 
pressions of relationship, but no one of them has produced а term 
which is both objective and easily interpreted. Methods of comput- 
ing relationships in terms of the correspondence between rank posi- 
tions of scores, and in terms of the percentage of the scores falling 
within a specified unit of variability of each other, have been pro- 


be expressed in 


6 See Table 5 on page 392 of Rugg's Statistical Methods Applied to Education 


(Houghton Mifflin), or any similar table of sigma values. 


212 SUMMARIZING THE RESULTS OF TESTING 


posed. In general these procedures lack sufficient exactness to war- 
rant their common use in the analysis of test results. The student 
who is interested in these different methods of revealing relationships 
will find them discussed in certain of the treatments on statistical 
methods listed in the references at the end of this chapter. In this 
discussion, attention is given exclusively to the Pearson product- 
moment method, which is not only the basic method but also the one 
most commonly used in educational statistics. 

The index expressing the degree of relationship between two serics 
of measures is called a coefficient of correlation. The coefficient re- 
sulting from the application of the Pearson product-moment method 
is designated as r. The possible values of r range from perfect posi- 
tive relationships (+ 1.0) through all the possible decimal values 
through zero to — 1.0 indicating a perfect negative relationship. An 
r of zero is taken to mean that no relationship exists between the 
measures or that it is entirely due to chance. Positive relationships 
may be expected between such factors as barometric readings and at- 
mospherie pressure, or between each of such factors as native capacity, 
initiative, effort, concentration, interest, and school accomplishment 
in a given field. Negative relationships are usually found to exist 
within a given school grade between the chronological age of the 
pupils and their achievement scores on a reliable achievement test. 
Many low or zero relationships are found in educational data, but the 
best illustration of this type of correlation is one in which pure chance 
operates. If two packs of numbered cards are shuffled carefully and 
cards are drawn from each pack and paired, the resulting relationship 
is due to pure chance, and the coefficient of correlation (r) approaches 
zero. If the same packs of cards are rearranged so that the numbers 
appear in ascending order in each pack and cards are drawn off the 
top of each and paired as before, the r obtained should be positive and 
very high. If one of the packs is inverted and the cards are drawn 
as before, the result should be a high negative correlation. 

The Pearson product-moment method of computing correlations, 
while involving a large number of rather complicated calculations, 
really calls for very few skills that the student has not encountered 
previously in this work. This coefficient of correlation (r) is a single 
iq aon br pes pe the tendency of corresponding pairs 
B той, ме т deviate similarly from their respective 
coefficient from data arr ‘ A hs QUE Ме ue وا‎ 
pn anged in frequency tables of the double-entry 


The double-entry or correlation table permits the simultaneous 


COMPUTATION OF PEARSON COEFFICIENT OF CORRELATION 213 


tabulation of two measures of the same individuals. The class inter- 
vals are set up exaetly the same as in preparing a simple frequency 
table. In fact, it is merely an overlapping table with two sets of class 
intervals, one reading upward along the left-hand margin and one 
reading to the right along the top. Such a double-entry tabulation 
is shown in Table 46, which also serves to illustrate the specific steps 
involved in computing the correlation coefficient. 

The data in Table 46 represent the very real problem of determin- 
ing the reliability of an experimental test by finding the correlation 
of the scores made on one-half of the test with the scores made on 
the other half, Let us assume that Pupil A made scores of 25 and 
29 on the two halves of the test. The position of the score of 25 on 
the first half is found in the scale for that part of the test. This is 
in the step with a mid-point of 24. We then move across the table 
horizontally until the proper space is found for the score of 29 on the 
second half of the test. This is in the step with a mid-point of 30. 
A single tabulation mark in that space identifies both scores and at the 
same time indicates something of their relation to each other. In 
such a table a tendency for the frequencies to group themselves along 
the diagonal of the table itself is an indication of a positive relation- 
ship. Scores which deviate from the diagonal reduce the relationships: 
Figure 22 indicates something of the types of relationship which may 


———€ 


Mn 
r High and Negative r Zero or Indifferent 


Fic. 22.—Types of Relationship. 


Mn 
r High Positive 


eristic groupings of data. This interpretation 


be expected from charact 
bles read upward and to 


can be made only when the scales of the ta 
the right. 
118. Computation of Pearson Coefficient of Correlation. . 

The Pearson product-moment coefficient of correlation is found by 


the solution of the following formula: 


zey 
N 


— босу 


r= 
OzTy 


214 SUMMARIZING THE RESULTS OF TESTING 


in which N is the number of cases in the distribution, о; the standard 
deviation of the distribution on the z-axis, øy the standard deviation 
on the y-axis, c; the correction on the x-axis, c, the correction on the 
Xy-axis and zy the summation of the products of the deviations of 
each measure from the means of the distributions. 

'The following steps are involved in the solution of the formula 
as applied to the data given in Table 46: 


Step 1. The data are already tabulated in a double-entry or corre- 
lation table, so the first step in the work is to total the frequencies on 
each axis and eross-check the totals. 

Step 2. Assume suitable means for each axis of the table and lay 
off steps above and below the zero step. Compute the corrections 
on the x-axis and on the y-axis exactly as in finding the arithmetic 
mean and the standard deviation. In this work the c, is + 0.307 and 
the c, is — 0.109. 

Step 8. Multiply the fd column by the d column and summate 
the resulting fd*’s separately for each distribution. Complete the cal- 
culation of the standard deviation for each distribution. Leave the 
resulting sigmas in terms of steps. This will save an extra multipli- 
cation and unnecessary division later in the work. 

Step 4. Find the sum of the zy products. This constitutes the 
only absolutely new step in the process up to this point. In the 
product-moment method each score or frequency is weighted in pro- 
portion to the distance it lies away from the means of the distribu- 
tions. Thus, in the example, the single individual score which lies in 
the extreme upper right-hand section of the table deviates a distance 
of + 6 steps from the mean of the y-axis and + 7 steps from the mean 
of the z-axis. The combined moment of this single case is found 
by multiplying it by the product of 6 and 7 (1 X +6 х +7 = 42). 
This case therefore has a moment of 42. The three cases at the in- 
tersection of the steps with mid-points of 33 have a combined product 
of 90 (+5 X +6 X 3 = 90). All cases in the upper right-hand and 
lower left-hand quadrants have a positive moment, owing to the alge- 
braic law of signs in multiplication. All frequencies lying in the 
upper left-hand or the lower right-hand quadrants of the table have 
negative product-moments since the product of a plus step-deviation 
by a minus step-deviation results in a negative sign. In this work the 
ty products are summated algebraically in a column along the right- 
hand side of the table. The total of the zy products is 1780. 

Step 5. The numerator of the fractional equation representing the 


215 


COMPUTATION OF PEARSON COEFFICIENT OF CORRELATION 


966+ 
ups = tert T 
m— D^ тп = BOA = 20 “600 = 252 
= E. e a эйлер ортага У NC. IT ANN s NE ИШИНЕ 
SUO ——=66дт 1277 — йг 0S 000—767 "EES ете Е 2080+ = гр " Sog — zer 7 
16276 = STA = ^o (yard 86 801 966 [r3 ТСЕ ГЕН c9 99 91% 106 T61 SL | e 
"807 = 100 — OOF 
aE a | FF #1 sI уз 211 26 z9 soe 99 801 69 SF st |» 
1908 " wiz 
100 = 10 Zz 9 9 n Е z 1 1- z- £- +- s- |» 
IF 
*601'0— ncm 49 ———| 
һ E: йш [n а Е 6 1 бє oF z9 Em 9 3c ez 21 e |z 
ORLO £ 
og 98 9 9- |1 t | 
oe 
оп | set | s s- |s £ z |е 2 
e ls E 
оп | vaz | 09 +- |н 8 t 9 э 8 
п E 
о | ree | sot | s- [о t 8 п + © ИН 
z эп 8 М 
oor WG zzi > | 19 9t 6t зї L 1 aR 
E ə s 2 
1£ m 7 1- | * £t [1 et 1 1 a 
A a- : , t 
we 0 02 e * 6t ez Y вт 1 
7 es 7 1 Ес 1 s [1 + 6 19 
, a " E z- 
ziz | ost | 06 z oF 1 n st 8 zi © с 
or п » se 8 
10g | 91% | z £ [1 © e It 9 © Га 
be к w a b 
zsz | woe | 9. * өт 2 8 s © c og 
o s o ot Д 
сос | 0% | 0с s о 1 £ * © ee 
ke s oot or 
— 
ФР 9t 9 9 t 1 og 
i» 
т ру 0 Р "n ЕЯ ££ og 25 с [1 En т zt 6 94 € o ismod 
ж +— 1159} JO ди рпооәз uo 21028 “DIN 


(£861) ISAL saarrray 


AULANOID ANVIG YAOI NO 59н025 JIVH-3IV]II яо NOILVIGY DNIMOHG S'IHV], NOLLY'ISNNO|) 


9F WISYIL 


216 |» SUMMARIZING THE RESULTS OF TESTING 


correlation coefficient is "a — ссу. This quantity is found by divid- 


ing 1780 by the number of cases in the distribution and subtracting 
(algebraically) the product of the corrections for the two distributions. 
The result of dividing 1780 by 414 is 4.2995. The c,c, product is 
— 0.0335. Since this c.c, product is negative (owing to the negative 
sign of one of the corrections), the net result is the addition of these 
two quantities. The numerator of the fractional part of the formula 
now becomes 4.33. 

Step 6. The denominator of the fractional part of the formula 
comprises the product of the two standard deviations. The ооу prod- 
uct for this correlation table is 4.995. 

Step 7. The correlation coefficient (r) is the ratio of the two values 
found in steps 6 and 7. The r of this distribution is therefore +.867, 
which means that the relationship is positive and significantly high. 


119. Meaning of the Correlation Coefficient. 


It was suggested earlier in this discussion that the interpretation 
of the correlation coefficient is probably much more difficult than its 
computation. There are a few devices which are helpful to the in- 
experienced worker, but, in general, assurance in the interpretation of 
these measures comes only through experience. The suggestions given 
in this section may be useful during the period when this experience 
is being gained. 

One of the important outcomes of the use of correlation methods 
is that within certain limits it makes possible the estimating of un- 
known values from known values, The accuracy of this estimate, 
however, depends directly upon the correlation between the factors 
measured. For example, if it is known from previous experience that 
there is a high positive relationship between achievement in a specific 
manual arts subject and the students’ response to certain manual dex- 
terity tests, the probable achievement of a group of prospective 
students in this manual arts course may be determined within limits 
by securing their response to the manual dexterity test. A correlation 
coefficient of + 1.0 for the two factors would mean that an estimate 
of accomplishment based on the one factor would be approximately 
100 per cent correct. As the amount of the correlation decreases the 
accuracy of the forecast declines, but not in a direct manner. А corre- 
lation of + 1.0 may mean 100 per cent accuracy in the estimate based 
on the relationship; but a correlation of +.50 does not mean at all 
that the estimate based on it will be 50 per cent correct. A glance 


at the accompanying table will demonstrate this interesting fact about 
the correlation coefficient. 


MEANING OF THE CORRELATION COEFFICIENT 


The percentages of forecasting 
accuracy for different values of r 
given in Table 47 are obtained by 
applying Kelley's proposed formula 
for the coefficient of alienation 
(Б =V1—7) and then deducting 
the resulting values expressed as per- 
centages from 100 per cent. If esti- 
mates of one variable are to be made 
from measurements of another re- 
lated variable, this table will prove 
to be a useful safeguard. 

The following illustrations and 
practical interpretations of typical 
correlation coefficients representa- 
tive of the sort obtained from educa- 
tional data have been gleaned from 
anumber of sources. They are offered 
here for whatever guidance they may 
give to the student or teacher inter- 


ested in this type of test analysis. 


217 


TABLE 47 


PERCENTAGE OF FORECASTING ACCURACY 
ror SPECIFIC VALUES OF T 


r value Educational Situation 


Coefficient Percentage of 
of Forecasting 
Correlation Efficiency . 

1.00 100 

99 86 

98 80 

95 69 

90 56 

566 50 

SO 40 

75 34 

40 29 

65 24 

60 20 

"50 18 

40 8 

30 5 

20 2 

10 % 

Interpretation 


--.96 Relation of two forms of a long, 
analytical reading test for high- 
school students. 

Repetition of same form of 
group test of mental ability after 
a lapse of one semester. 


a 


+ 80 


A composite of three separate es- 
timates by same teacher of the 
ability of a class of 35 students to 
respond to an objective test in 
industrial arts. 

Scores on a good group intelli- 
gence test and the class grades of 
a class in industrial arts. 


+ 85 


+ 50 


Chronological ages of pupils in a 
given grade and achievement 
scores on an objective test. 


— 24 


Evidence of unusually high re- 
liability of measurement; treat 
scores with confidence. 
Evidence of a marked relation- 
ship; considerable prognostic 
power even after lapse of a 
long interval. 

Evidence of some relationship 
but of limited use for prog- 
nostic purposes. 


A very slight relationship of 
no practical value for forecast- 
ing purposes (only 13 per cent 
effective). 

Negative relationship of an in- 
different sort. Merely shows а 
very slight tendency for young- 
er pupils in a grade to achieve 
at a higher level than the 
average. 


218 SUMMARIZING THE RESULTS OF TESTING 


120. Practical Uses of Correlation Method. 


The teacher of industrial arts or the student of measurement in 
this field will unquestionably find the greatest use for correlation 
techniques in connection with the analysis of objective tests. The 
validity of a test may be expressed statistically in terms of the corre- 
lation of the instrument with some other criterion, such as another 
measure of known validity. The determination of the reliability of a 
test is almost entirely a matter of correlation method. Mastery of 
these uses of the correlation techniques will make the teacher a better 
critic of commercial standardized materials as well as a more inde- 
pendent and efficient builder and critic of teacher-made tests for class- 
room use. Such mastery can come only from extensive practice on 
problems calling for the use of these skills.” Students who are inter- 
ested in the theory of correlation or in the use of correlation methods 
in more critical research with tests are referred to the many excellent 
textbooks on statistical methods now available, 


V. ASSIGNMENT OF RELATIVE AND ABSOLUTE RANKS 
121. Relative Ranks, 


Achievement as expressed in test scores can have meaning only 
when it is possible to consider it in relatio 
level of achievement. In many cases, as when informal objective ex- 
aminations are used, no definite standards or norms of achievement, 
are available. Some simple method of comparing the accomplishment 
of each pupil in relation to the other individuals in the class is essen- 


tial. Tl is ; : 
TABLE 48 jal. The process of assigning relative ranks 


to pupils’ scores in terms of their size is one 
way of doing this. "This is accomplished by 
n 3 assigning to the individual making the highest 
A з score the first position in the class, the pupil 
10 45 making the second highest score the second 
10 45 position, ete. The assignment of such relative 
9 
9 
9 
8 
7 


n to some other, known 


Pupil Score Rank 


ranks is quite simple where the individual 
{ pupils make different Scores, or where no tied 
g Scores appear. The illustration given in Table 

19 48 shows how all such tied scores are treated 

in the assignment of relative ranks. 

Pupil A, with a score of 15 points, is assigned first position. Pupil 
B, with a score of 12, is given second position. Pupil C, with a score 
of 11, is given third place. Pupils D and E, both with scores of 10, 


7 See Problems 22, 23, 24, 25, 39, 
tional Measurements (Longmans). 


eH HoHHUOQUP»- 


and 40 in Greene's Work-Book in Educa- 


ABSOLUTE RANKS 219 


would normally be assigned fourth or fifth places, but since it is im- 
possible to assign fourth or fifth place to one rather than the other, 
tied or average rank is assigned to each. In this instance 4.5 position 
is given to each of the pupils making a score of 10 points. Pupils F, 
G, and H are also tied with scores of 9 points each, but since they 
would regularly be assigned sixth, seventh, and eighth positions they 
are each given the average of these positions, or seventh place. The 
practice, therefore, in all cases of tied scores is to assign average rank 
to the tied scores. When the number of tied scores is even, the ranks 
assigned will lie mid-way between the ranks which would ordinarily 
be assigned to two middle scores. When the number of tied scores 
is odd, the position assigned to all is the position which would nor- 
mally fall to the middle score. In general, the position assigned to 
the pupil with the lowest score will agree with the number of cases in 
the series except when the last scores are tied. 


122. Absolute Ranks. 


The practice of assigning relative positions to pupils on the basis of 
their test scores, though aiding in the interpretation in some ways, 
actually covers up something of the actual situation. As a matter 
of fact, the assignment of relative ranks covers up the true differences 
in the size of scores. In Table 48 the difference of three score points 
between Pupils A and B is indicated by only a single position in rank 
the same as is given to the difference of one point for Pupils B and C. 
Thus, relative ranks reveal that a pupil is above or below another 
in achievement, but they do not indicate in any way the magnitude of 
that difference. Relative ranks also take no account of the actual 
achievement level at which the accomplishment takes place. A pupil 
having a rank of 18 in a class of 20 would be considered as having 
a low ranking in his group. However, if he were found to rank 
eighteenth among 400 similar individuals his position would indicate 
a significantly different type of achievement. Percentile ranks, as one 
form of absolute ranking, take this factor into account by reducing all 
ranks to a basis of 100 units. A percentile rank of 100 means that 
the individual making the specified score achieves at a level high 
d 100 per cent of a similar group without regard to 
the number in the group. In a similar way, a percentile score of 75 
means that the individual made a score such that it exceeds that of 
75 per cent of the individuals of his group without respect to number. 

Percentile scores are easily computed from frequency tables and 
are very useful in comparing the achievement of pupils taking an 1m- 
formal or non-standardized test. Percentile scores are also used very 
widely in the interpretation of standard test scores at the secondary- 


enough to excee 


220 SUMMARIZING THE RESULTS OF TESTING 


school and college level. The student will recognize the seventy-fifth 
percentile as a measure with which he has already had some contact. 
'This percentile is the same as the third or upper quartile (03). It is 
found by exactly the same methods as are used in finding the median 
except that 75 per cent of the cases in the distribution are counted 
out below the point on the scale assigned as the seventy-fifth per- 
centile. The same general methods are applied in the determination 
of the twenty-fifth, fiftieth, or any other designated percentile. The 
tenth, twentieth, thirtieth, and fortieth percentiles, ete., are known as 
the deciles. These are very often used in test interpretation. 

The computation of the commonly used percentile scores is illus- 
trated in Table 49. Since all the processes involved here have been 
used in earlier work, the computation is presented without comment. 


TABLE 49. 
COMPUTATION or PERCENTILE SCORES 
Percentile Test 
Class Interval } Score Interpretation Score 
162.5-167.5 1 100 Score equaled or excelled by no 167 
157.5-162.5 1 student. 
152.5-157.5 0 90 Score equaled or excelled by 10 142 
147.5-152.5 3 per cent of students. 
142.5-147.5 2 80 Score equaled or excelled by 20 135 
137.5-142.5 5 per cent of students. 
132.5-137.5 4 75 Third quartile—score equaled or 128 
127.5-132.5 2 excelled by 25 per cent of stu- 
122.5-127.5 5 dents. 
117.5-122.5 3 70 Score equaled or excelled by 30 124 
1125-1175 9 per cent of students. 
1075-1125 1 60 Score equaled or excelled by 40 16 
102.5-107.5 8 per cent of students. 
97.5-102.5 8 50 — Median—score equaled or ex- 112 
92.5- 97.5 4 celled by 50 per cent of students. 
87.5- 925 1 40 Score equaled or excelled by 60 109 
82.5- 87.5 2 per cent of students, 
77.5- 825 0 30 Score equaled or excelled by 70 105 
72.5- 77.5 1 per cent of students. 
67.5- 72.5 0 25 First quartile—seore equaled or 103 
62.5- 67.5 0 excelled by 75 per cent of stu- 
57.5- 62.5 1 dents, 
N=71 20 Score equaled or excelled by 80 101 
ü per cent of students. 


Score equaled or excelled by 90 95 
per cent of students. 


0 Score. equaled or excelled by 58 
practically all students. 


SUMMARY 221 


The interpretation of percentile scores frequently gives some 
trouble to the worker inexperienced in their use. Fig. 23 is a graphic 
presentation of the percentile scores given in Table 49. This figure 
shows the characteristic curve (ogive) resulting from the use of per- 


INN 


170 


160 


150 


Test Scores 
© 


90 
------ Percentile Scores 
———— Smoothed Percentiles 


70 
60 | 
L 
о 50 6 жю 80 


0 10 20 30 4 


90 100 
Percentiles 


Fic. 23.—Ogive Curve Based on Data in Table 49. 


The heavy solid line in the figures represents the re- 
thing of these percentile scores. This smooth- 

used when percentile scores are based on 
d are set up as tentative norms for the 


centile scores. 
sults of an arbitrary smoo 
ing process is frequently 
fairly large populations an 
interpretation of the tests. 
VI. SUMMARY 


Тыз chapter presents а non-technical discussion of a few of the 
common statistical tools which teachers of industrial arts will find 
useful in the analysis and interpretation of educational test results. 
Discussions of four of the six major statistical techniques outlined in 


222 SUMMARIZING THE RESULTS OF TESTING 


the introductory paragraph of this chapter are presented. The funda- 
mental principles of the grouping and tabulating of test scores are 
stated and illustrated. The need for measures of dispersion is shown. 
The quartile deviation and the standard deviation are explained in 
some detail, and practical applications of these measures to problems 
of test analysis are made. The general meaning and the methods of 
correlation are given, along with a few definite hints concerning the 
interpretation of correlation coefficients. The practical uses and mean- 
ings of the ranking of test scores on both the relative and the absolute 
basis are discussed. The two remaining problems, dealing with the 
derivation and interpretation of test norms and standards, and the 
use of simple graphic methods of presenting the results of statistical 
analysis, are reserved for treatment in the following chapter. 

There has been no attempt to make this chapter a complete dis- 
cussion of all the interesting or even useful statistical techniques. To 
do this would require a volume in itself. As a matter of fact, the 
brevity of the treatment makes it impossible to present an adequate 
number of examples and illustrations to give the inexperienced worker 
sufficient experience with statistical problems. Real mastery of 
these skills can come only through repeated and continuous use. The 
student who is interested in achieving real skill and understanding in 


this field will wish to make extensive use of the selected references on 
page 224, 


EXERCISES IN SUMMARIZING RESULTS OF TESTING 
TABULATING TEST Scores 


Problem 1 
a. Arrange or rank these scores from an ob 
in descending order: 


95, 99, 40, 44, 68, 84, 54, 60, 91, 58, 66, 66, 72, 87, 77, 76, 65, 66, 70, 89, 77, 


80, 78, 78, 62, 64, 64, 64, 54, 57, 58, 63, 93, 90, 76, 59, 62, 69, 70, 85, 72, 73, 
83, 71 


jective examination in woodworking 


. What is the largest score made on this test? à 

What is the smallest score made on this test? 

. What is the range of the scores? 

If a frequency table with a step of 3 is made, how many steps will be re- 
quired? 


f. What will be the limits of the step required for the lar 

1 1 gest score? 
g. What will be the mid-point of this step? 
h. Make a frequency table of these scores using a 3 


divisible by the size of the step. Do your work on the left half of a sheet 
of paper and preserve it for use in later prol 


. pres blem work. 
i Hur work is right, the frequencies reading from the top will be as 
ollows: 


1, 1, 2, 2, 1, 3, 1, 4, 2, 3, 5, 4, 6, 2, 3, 2, 0, 0, 1,0,1 


pass 


-point step and mid-points 


N=44 


EXERCISES IN SUMMARIZING RESULTS OF TESTING 223 


COMPUTING THE ARITHMETIC MEAN 
Problem 2 


Compute the arithmetic mean from the frequency table prepared in Problem 1. 
(Answer — 704 from frequency table with step of 3.) 


COMPUTING THE Mip-MEASURE AND THE MEDIAN 
Problem 3 
Compute the median from the frequency table in Problem 1. 


(Answer — 69.7.) 
Problem 4 


Find the mid-measure for the scores given in Part a of Problem 1. 
(Answer — 70.0.) 
Сомротіха Measures OF VARIABILITY 
Problem 5 
Find the quartile deviation for the scores tabulated in Problem 1. 


(Answer = 8.5.) 
Problem 6 


Find the standard deviation of the scores in the table prepared for Problem 1, 
(Answer = 13.3.) 
CoMPUTING Measures OF RELATIONSHIP 
Problem 7 


The following paired scores were obtained by giving the same form of an ob- 
jective examination two times to the same pupils: 


1st 2nd Ist 2nd 

. Pupil Test Test Pupil Test Test 
A 61 67 N 52 55 
B 56 60 0 44 49 
C 73 76 P 40 41 
р 67 70 Q 33 33 
E 53 49 R 58 59 
F 48 52 5 76 77 
G 43 44 T 70 75 
H 35 31 U 63 63 
I 23 25 M 50 50 
J 57 56 wW 41 46 
K 78 81 X 36 37 
L 71 73 X 34 31 
M 65 67 Z 25 27 


a. Prepare a correlation table of these 26 pairs of scores. Use a 3-point step on 
both axes. Compute the coefficient of correlation as a basis for expressing 
the reliability of the objective examination. (Answer — +.959.) 


224 SUMMARIZING THE RESULTS OF TESTING 


COMPUTING PERCENTILE RANKS 
Problem 8 


Use the frequency table for the first test scores tabulated in Problem 7, and 
compute the percentile scores for each of the deciles as shown in Table 49. 
Check your own work for accuracy. 


SELECTED REFERENCES 


Garrett, Н. Е., Statistics in Psychology and Education. New York: Longmans, 
Green and Company, 1926. 

Соор, Warren R., The Elements of Statistics. Ann Arbor: The Ann Arbor 
Press, 1933. 

Greens, Н. А., Work-Book in Educational Measurements. 
mans, Green and Company, 1930. 

GREENE, Н. A., and JORGENSEN, A. N., The Use and Interpretation of Elementary 
School Tests. New York: Longmans, Green and Company, 1935. 

Grecory, C. A., Fundamentals of Educational Measurement. New York: D. 
Appleton Company, 1922. 

Houzincer, Karu J., Statistical Methods for Students in Education. 
Ginn and Company, 1928. 

Howzincer, Karu J., Statistical Tables for Students in Education and Psychology. 
Chicago: University of Chicago Press, 1925. 

Houzincer, Karu J., and Mircuett, B. C., Exercise Manual in Statistics. Bos- 
ton: Ginn and Company, 1929. i 

Кемик, T. L., Statistical Method. New York: The Macmillan Company, 1923. 

Kramer, Epxa E., Educational Statistics. New York: John Wiley and Sons, 
Ince., 1935. 

LiwpQuisr, E. F., and Sroppamp, Groom D., Study Manual in Elementary Sta- 
Listics. New York: Longmans, Green and Company, 1929. 

Ореш, C. W., Educational Statistics. New York: The Century Company, 1925. 

Отв, ARTHUR S., Statistical Method in Educational Measurement. Yonkers 
New York: World Book Company, 1925. | $ 


Ruca, Hanorp O., A Primer of Graphics and Statistics for Teachers. Boston: 
Houghton Mifflin Company, 1925. E 


TnasvE, M. R., Measuring Results in Education. N T le Я " 
Company, 1924, . New York: American Book 


New York: Long- 


Boston: 


4^ \ Ц ' 
^4 n F * au, 
u^ T. е. : 
PPS, e ` кА, 
А, ИР 
"л ei" AE 
Р. مو‎ ER 
“~~ 
LI * 


CHAPTER XV 
INTERPRETING THE RESULTS OF TESTING 


I THE RESULTS OF TESTING 


193. The Meaning of a Test Score. 

It is important in this chapter on the interpretation of the results 
of testing to define clearly what is meant by a test score. In order 
to accomplish this, two or three new concepts may require explanation. 
In the first place, a test score is а numerical expression of perform- 
ance on the part of an individual. Sometimes the test score is merely 
the number of exercises responded to correctly. Sometimes it is an 
arbitrarily defined seale value. But whatever its form, its function 
is to reveal in a quantitative way the performance of an individual 
as he responds to stimuli given under certain conditions. This leads 
to the second concept involved in the meaning of a score. The test 
score is an evidence of performance. Performance, the response of the 
individual to the test situation, is the expression of ability operating 
under certain conditions. The pupil may make a poor score because 
he does not have the ability to do better—may not know the facts. 
On the other hand, he may make a low score because of certain phys- 
ical conditions: illness; discomfort; poor hearing, sight, or illumina- 
tion; a broken pencil; a dislike for the subject, the teacher, or exam- 
iner; a failure to give attention to and to comprehend the directions, 
ete. Any one of these or à dozen other factors may affect the score. 
Accordingly, there is the possibility and even likelihood of a serious 
error in the assumption that a test score is а direct evidence of ability. 

` The conditions under which the performance takes place must be 
known before it is safe to infer ability from performance. ‘ 

Ability, as an abstract concept, may be defined as the power to do. 
Power to do, to res ond to stimuli and to situations, is the product of 
training and experience. This suggests that, unless training and native 
capacity factors are known, inferences as to abilities may be mislead- 
ing. This point becomes particularly serious in the interpretation 
of mental-test results, for it is common prac 


tice for users of mental 
tests to infer innate capacity (mental ability) from performance 
' i 225 


226 INTERPRETING THE RESULTS OF TESTING 


scores. The real seriousness of this type of uncritical inference may 
be seen by comparing the interpretations of an achievement-test score 
and a mental-test score. Both are basically expressions of perform- 
ance. Equal abilities may be inferred from equal scores from both 
types of tests if and when all the conditions under which they are 
given are definitely under control. Although it is difficult to make 
sure that all physical and physiological factors are adequately con- 
trolled in a testing situation, it is possible to regulate most of the 
mechanical conditions within reasonable limits. 

The significant point to note here, however, is the fact that users 
of achievement tests stop with an inference of equality of ability from 
equal performance scores, but users of mental tests are obliged to take 
a further inference. In the interpretation of mental-test results it is 
common practice to infer equal native capacity from apparent evi- 
dences of equal abilities. The fallacies in this argument and the 
dangers of this step must be readily apparent. Equal capacities may 
be inferred from performance scores only when there is direct and 
positive evidence of two things: first, that the conditions under which 
the testing took place were identical and equally well controlled; sec- 
ond, that the training opportunities of the individuals compared have 
been equal. The mechanics of testing now make it fairly easy to con- 
trol testing conditions. The second factor represents a real stumbling- 
block in the way of an accurate and sane interpretation of the mental- 
test results. The naive manner in w 
of mental tests assume equality of learning opportunity, and hence 
equal capacity from equal performance scores on mental tests, is one 
of the things which has made teachers and students skeptical of their 
value. 

It is possible that the foregoing discussion of 
score may appear to indicate that it is impossib 
any kind of a test score. Such is not the inten 
oc neocon 0а a pee for a conservative attitude in 
about the variables underl ir кыр ШЕ ш IPM amu * hat is known 

ying test scores, the more critical must the 


user become. The greatest damage that has been done to the field 
of educational measurements in the past I 


of carelessness and ignorance on 
tendency to draw unw: 


hich some makers and many users 


the meaning of a test 
le to give meaning to 
lion, even though the 


Se, control the mechanical con- 


and draw sane and defensible conclu- 
s. 


THE MEANING OF STANDARDIZATION 227 


124. Giving Meaning to Informal Test Scores. 


The user of industrial education tests in the classroom is con- 
fronted with two types of test data for interpretation. The first type, 
and undoubtedly the more common of the two, deals with the results 
of informal, teacher-made tests. The results from these home-made 
tests in turn are of two types: the subjective scores assigned by teach- 
ers to pupils’ responses to essay-type tests, and the performance scores 
resulting from informal objective examinations. Although something 
can be done to improve the interpretation of the relatively unreliable 
marks assigned to the discussion-type exercise, the performance scores 
resulting from reasonably long and reliable objective examinations are 
much more important measures of achievement, and as such deserve 
complete and accurate interpretation. The second type of educational 
test data requiring interpretation arises, of course, from the results of 
using standard tests. Since one of the major functions of the standard- 
ization of a test is the establishment of meaning for the test scores, 
many more types of interpretation are possible for data of this type. 
Purely for convenience in the organization of this discussion, problems 
of the interpretation of standard test scores are considered first. 


п. NORMS AND STANDARDS 


125. The Meaning of Standardization. 

Early in the history of objective testing in the classroom prac- 
tically all that was required for development of a so-called standard- 
ized test was to give a few reasonably suitable test exercises to a 
hundred or more pupils in different school systems. These results 
were then compiled and submitted as norms. In fact, for many years 
almost the only real difference between a standardized test and a rea- 
sonably good informal objective test was the fact that the former 
had been tried out in a larger number of different classes. Test 
standardization as it is now interpreted means much more than the 
mere derivation of norms, although the existence of norms is still one 
of the chief distinctive features of the standard tests. There has been 
much improvement in both the informal test and the more formal 


standardized test. 
| construction practices the standard- 


In terms of present-day test-co 
ization of a test involves a long period of experimentation with a large 
body of subject-matter exercises. After the subject-matter field to be 


tested has been decided upon, there is the very difficult problem of 
selecting the more important areas of this field to be sampled. Many 
times experimental evidence must be secured before it is possible to 


228 INTERPRETING THE RESULTS OF TESTING 


decide upon the best type of test exercise to use. Even then, many of 
the exercises prepared in preliminary form for the test are found to be 
badly stated or to be totally unsuited for the type of test to be con- 
structed. Usually from four to six times as many exercises must be 
prepared in the preliminary work with a test as will appear in the 
test itself when in final and standardized form. Special care must be 
taken to see that a suitable range of difficulty is provided in the items, 
and that multiple items covering certain of the more important skills 
are prepared in parallel form so that these items may be adequately 
sampled in the several forms of the test which must be prepared. 
After the exercises themselves have been written in preliminary form 
they must be tried out under experimental conditions in typical class- 
rooms for the purpose of discovering the faulty or ambiguous items 
and for the additional purpose of discovering the relative difficulty of 
the several items. From the results of this preliminary use of the 
exercises two or more roughly scaled forms of the tests may be set up 
for further experimental use. From the results of this second trial it is 
usually possible to equate the forms of the tests quite closely by shift- 
ing hard and easy items from one form to another until approximate 
equality is reached. Then the tests are ready for a further trial in 
a large number of representative classes for the purpose of further 
equating the forms and establishing norms. It is thus apparent that 
while standardization is only one of the final steps in the preparation 
of a carefully made test, it is this extensive sampling of the results of 


the use of the test in many classrooms which affords the basis for the 
assignment of meaning to the test scores, 


126, Meaning of Norms, 


They are obtained by giving the 
entative sampling of pupils in the 
T to the group which the teacher 
the sampling is distributed over а 
ituations and the conditions under 


SPECIFIC USES OF TEST NORMS 229 


which the tests are to be administered are rigidly followed, the norms 
furnish a reliable and useful basis for interpretation. 


127. Standards and Norms. 

The use of the term standardized in the discussion of tests of the 
type for which norms are provided has led to the development of a 
careless tendency to treat the words “standards” and “norms” as being 
synonymous. The process of securing the data for the critical analysis 
of tests and the derivation of suitable norms is properly known as 
standardizing. However, the term standards when used to refer to 
levels of pupil achievement, implics an ultimate goal to be achieved. 
Standards may not actually be reached by any individual, but they 
are levels of achievement toward which to strive. Norms are the levels 
of achievement which typical pupils actually attain. It is clear that, 
in the light of these definitions, few tests are accompanied by 
standards. 


128. Specific Uses of Test Norms. 

Although the general function of test norms is to provide a basis 
for the interpretation of test scores, several specific uses should be 
pointed out at this time. For example, test norms give meaning to 
the test score. There is no way of determining except through com- 
parison with the norm for a test whether a given score is high or low. 
To be explicit, is a score of 96 points on the Newkirk-Stoddard Home 
Mechanics Test a high, low, or average score for a pupil to make at the 
end of the year’s course in general shop work? A reference to the 
norms given in T able 50 will give an answer to this question. As a 
matter of fact, such a score is so good that only 25 pupils in a hun- 
dred may be expected to do better than that. 

Norms point out to both pupil and teacher the actual levels or goals 
of achievement which both should attempt to attain. That is, the 
norms tell all parties concerned how far they have to go and approxi- 
mately when they have arrived. Norms provide almost the only ob- 
jective basis for the analysis of individual pupil weaknesses. Certain 
of the better achievement tests in special subject fields are made up 
of a number of different test-parts designed to measure distinct 
aspects of achievement in the subject. Many of these tests are pro- 
vided with separate norms for the special parts of the tests making it 
possible to reveal the pupil’s standing in each of the independent test- 
parts. (See Table 51.) 

Norms for achievement tests used in connection with results from 
mental tests make it possible to determine within practical limits 
whether the pupil is working up to the real ability he possesses. The 


230 INTERPRETING THE RESULTS OF TESTING 


TABLE 50 
Enp-or-Year Norms ror Newxink-Sropparv Home MECHANICS Test 


Point Scores 


Form A Form B 
74.7 74.6 
77.9 73.0 
95.8 96.3 
60.5 54.5 
25.1 279 


TABLE 51 


Scores on Iowa SILENT READING Техт: ELEMENTARY, BY А Nintu-Grave Puri 


Score 
Test Part Test 
1. Paragraph meaning 
A. Science .. 18 40 
B. History 22 
2. Word meaning 
А. General vocabulary 24 42 
B. Subject-matter vocab 18 
3. Selection of central idea of para, 10 10 
4. Sentence meaning 28 28 
5. Location of information 
A. Alphabetizing ‘ 3 12 
B. Use of the index 9 
Total comprehension score. . 132 
6. Rate of silent ТЕШИ, r IEDR a ы f eı a 27 


basis for this type of analysis of 
parison between the mental abili 
his mental age and his education 
educational age, 


129. Kinds of Norms, 


accomplishment is found in the com- 
ty of the pupil expressed in terms of 
al achievement as represented by his 


accompanies a test depends to a large 


chool system at which the test is used. 
omewhat by the nature of the test itself. 
use in the elementary-schoo] grades are 


1 ‚ € norms, and age 
norms. 'lests intended for use School grades are usu- 
Age norms do not 
ade levels, because so 
hievement. Then, too, 


GRADE LEVELS 231 


the curve of mental growth flattens out very rapidly in the upper grade 
levels so that the inerements of growth in achievement from age to 
age at the upper levels are not significant. In place of the age norms 
for secondary-school and college tests, the common practice today is to 
provide quite detailed tables of percentile equivalents for the point 
Scores. 

In the lower grades, age as well as grade norms are usually pro- 
vided with achievement tests. In general, the type of norm is deter- 
mined by the method of grouping the scores when the tabulation for 
the norms is made. If the pupils are grouped by grades, without re- 
spect to age or school progress, the resulting norms are grade norms. 
If the pupils are classified in accordance with some specific age-scale 
as the basis for the tabulation, the resulting norms are age norms. 
In the derivation of grade norms for standard tests it is desirable to 
have the norms clearly indicate the period they are designed to cover. 


II. RESULTS DERIVED FROM NORMS 


130. Grade Levels. 

Test scores accompanied by a fairly reliable set of grade norms 
can be expressed in terms of the relative position of these scores with 
respect to these grade norms. In fact, this is one of the very simple 
and convenient methods of changing test scores into a form which 
even the child or his parents can understand. The fact that an indi- 
vidual pupil is in the seventh grade or the eighth grade has come to 
have some meaning to the average pupil or parent. An isolated test 
score cannot have this meaning. However, as soon as a test score 
is identified with a specific grade level of accomplishment it takes on 


a definite meaning. 


TABLE 52 
Grave Norms ron НАССЕКТҮ READING EXAMINATION: SIGMA 3 
Grade .......- 5 6 7 8 9 10 11 12 
SERE „нне 40 54 68 80 93 104 112 118 


The method of deriving the grade levels (so-called G-scores) from 
grade norms is ilustrated from the revised grade norms for the 
Haggerty Reading Examination; Sigma 3. Table 52 shows the scores 


to be expected at the end of the year for each grade. From this table 


it is apparent that a score of 93 is the ninth-grade end-of-the-year 


232 INTERPRETING THE RESULTS OF TESTING 


norm; a score of 104 is the end of the year norm for the tenth grade, 
ete. Thus a student making a score of 104 points on this test may 
be described as achieving at a level equal to an average pupil at 
the end of the tenth grade. This value may be simply expressed as 
107. A pupil making a score of 93 points on this test may be as- 
signed a grade-level position of 9'^, meaning that his achievement is 
comparable to that expected at the end of the ninth grade. Pupils 
making test scores between 93 and 104 may be assigned grade levels 
corresponding to the proportion of the distance between the end 
of the ninth (beginning tenth) grade and the end of the tenth 
(beginning eleventh) grade work the scores represent. To illustrate, 
the score-point distance from 93 to 104 is 11 points. A score of 95 
would therefore represent a grade-level distance which is two-elev- 
enths of the way past the beginning of the tenth grade. For prac- 
tical purposes this is two-tenths of a grade. Thus a score of 95 on 
this test corresponds roughly to a grade position of 102. 


131. Age Scores. 


Since age equivalents as derived scores have been discussed in this 
chapter in connection with the meaning of norms, and since they are 
not considered by most test workers to be of very great significance 
in the junior-high-school and secondary-school grades they are given 
no extended treatment here. 
132. Percentile Ranks, 


One of the favorite ways of interpreting test scores in the second- 
ary-school and college field is to use percentile ranks. Percentile 
ranks are of particular value in the interpretation of informal and non- 
standardized tests, since they permit the comparison of each indi- 
vidual in the group with the group of which he is part. In contrast 
with the method of assigning ranks by relative position, the cal- 
culation of the percentile rank expresses the absolute position of the 
individual pupil in his relation to the rest of his group. The calcu- 
lation of percentiles involves the division of the total distribution into 
100 equal parts, hence the term percentile. Achievement as repre- 
sented by a test score is expressed as a position in a population of 
100 cases. A score representative of high achievement ranks high in 
the percentile scale and is excelled by only a small number of cases. 


s ; meaning that such a 
almost certainly will not 


the percentil 


Score is so high that it 
be excelled by any one. Table 53 presents 
€ norms for this test in а convenient form for transferring 


INTELLIGENCE QUOTIENTS 233 


each possible test score into percentile equivalents. Since percentile 
scores represent the position of the individual score in a distribution 
of infinite population, they are very convenient devices for turning 
test scores from unlike scales into comparable measures. There are 
many occasions in the interpretation of educational-test results, par- 
ticularly in experimental situations, when this is very desirable. 


TABLE 53 


Iowa PLANE GEOMETRY APTITUDE TEST PERCENTILE EQUIVALENTS OF Test SCORES 


N — 413 (girls 199; boys 214) 


Score Percentile Score Percentile 

Equivalents Equivalents 
72 or more 100 34 55 
65 to 72 99 33 53 
63 to 64 98 32 50 
60. 97 31 46 
59 96 30 42 
58 95 29 40 
57 04 o 28 36 
55 to 56 93 27 33 
54 92 26 30 
53 91 25 27 
51 to 52 90 24 25 
50 88 23 23 
49 87 22 20 
48 85 21 18 
47 83 20 16 
46 81 19 14 
45 80 18 12 
44 78 17 10 
43 76 16 9 
42 75 15 8 
41 73 14 7 
40 70 18 6 
39 68 12 5 
38 66 11 4 
37 63 9 to 10 3 
36 60 1108 2 
35 58 6 1 
0 to 5 0 


133. Intelligence Quotients. 
The discovery and use of the concept of mental age made possible 


the development of the quotient idea. In a general way, all quotients 
derived from results of measurements express the development of the 
individual as related to average expectancy for his age or mental level. 


234 INTERPRETING THE RESULTS OF TESTING 


Scores on mental tests provide the basis for the derivation of mental 
ages. Scores from achievement tests, provided the tests are accom- 
panied by age norms, may be expressed as achievement or subject ages. 
The ratio between the mental age of an individual student and his 
chronological age is called an intelligence quotient. If an achieve- 
ment age is used, the resulting quotient is an educational quotient. 
The intelligence quotient (I.Q.) as found in practice is the result 
of dividing the mental age (M.A.) of the individual by his chrono- 
logical age (C.A.), both expressed in months. The result of this divi- 
sion is expressed as a whole number by multiplying the quotient by 
100. An illustration will make this procedure clear. Let us assume 
„that a pupil who is twelve years and four months of age makes a 
Score on a mental test which gives him a mental-age equivalent of 
eleven years and three months. At the outset, it is clear that since 
his mental age is less than his chronologieal age, he has not made 
quite normal development in mental ability. That is, his LQ. will be 
somewhat less than 100. Actually the 1.Q. of this individual is 


М.А. 135 
Gi A. “Tag * 100, or 91 


An intelligence quotient of 100 indica 
the part of the individual. A quotient of less than 100 means that 
there is more or less retardation in the development, and a quotient 
of above 100 means more or less accelerated development. It is com- 
mon practice for examiners in the psychological clinic to consider I.Q.'s 
of 90 to 110 as average or approximately normal. Quotients above 
110 are considered superior in proportion to the extent to which they 
exceed that value. Similarly, quoüents of less than 90 are below 
average and inferior in proportion to the amount which they fall 
below that value. I.Q.’s of very high and very low levels are naturally 
relatively rare. АП these interpretations of the intelligence quotient 


are of course dependent upon the reliability of the measuring instru- 
ment on which they are based. 


tes normal development on 


134. Educational Quotients, 


Many of the better achieveme 


f nt tests designed i - 
mentary and junior-high-school sehe ida ades 


rades are i i Г 

which permit the expression of M onis mas 9 
These educational-age Scores make it possible to derive an erm 
tional quotient by following a procedure identical with that used в 
deriving the intelligence quotient. Since age norms have been found to 
be mpractical for most of the educational achievement tests designed 
for high school, educational quotients have not been used ae wee 


OBJECTIFYING THE MARKING SYSTEM 235 


in secondary-school measurement. Subjects such as make up the bulk 
of the industrial arts field do not lend themselves especially well to 
standardization on an age basis. Hence there is not very great like- 
lihood of using these educational quotients in measurement in indus- 
trial education. The same comment appears to hold for the accom- 
plishment quotient (A.Q.), a ratio designed to indicate the relative 
degree to which an individual student is utilizing his capacity to 
achieve. The basic idea back of the accomplishment quotient has re- 
ceived some consideration in an earlier section of this chapter. 

The various quotients and other measures utilized in the interpre- 
tation of educational-test results can be made effective servants of the 
teacher only through extensive experience in their use. А complete 
control can be gained only through practice in their calculation and 
interpretation. Mastery can be retained only through continued use.' 


IV. RESULTS FROM INFORMAL OBJECTIVE TESTS 


135. Objectifying the Marking System. 

A critical examination of the marking system and the marks as- 
signed by teachers makes it very apparent that some radical improve- 
ments in these phases of educational measurement are needed. As a 
result of an extensive survey of the problem, and a study of the recom- 
mendations of educators who have studied the marking system, the 
following program for eliminating many of the unsatisfactory features 
of the present methods of assigning marks is submitted: 


1. Discard the practice of marking pupils in percentages. Three 
reasons are advanced for this decision: (a) the percentage scale has 
for its only fixed points 0 and 100. The former means just no ability 
while the latter means perfect mastery. Yet the complete scale is 
practically never used in practice. (b) The establishment of the limits 
of the scale fixes the intermediate values. Accordingly, the difference 
between marks of 75 and 76 should be the same as the difference be- 
tween marks of 97 and 98. Common observation reveals the absurdity 
of this assumption. (c) The use of the percentage scale presupposes 
that the teacher is able to distinguish as many as 101 minute differ- 
t. Experimental evidence ° reveals that teach- 


ences in accomplishmen ls 
from five to seven levels of ability. To use 


ers are able to distinguish 
for practice in the derivation of grade equivalents, 
quotients of various types is provided in the Work- 
ongmans). See particularly Problems 


1 Extensive opportunity 
age scores, percentiles, and 
Book in Educational Measurements (L 
30 to 33. 

2 Ruch, G. M., The 
and Company, Chicago, 


Objective or New-Type Examination, Scott, Foresman 
1929, pages 370-374. 


236 INTERPRETING THE RESULTS OF TESTING 


2 finer scale assumes an exactness of discrimination on the part of 
teachers which does not exist, (d) The use of an arbitrarily selected 
percentage as а passing mark as is very commonly done results in 
throwing the marks into a badly skewed distribution with too large a 
proportion of the marks piled up at or near the passing mark. | 

2. Each mark assigned to a pupil should be a symbol designed to 
indicate his power to do. This symbol should be defined in exactly 
worded statements, understood alike by teachers, administrators, and 
pupils. | | 

The following definitions of letter grades by Hillbrand? are given 
as an illustration of the type of statements that should be prepared 


by the teacher for the purpose of defining each of the letter steps in 
the five-point scale. 


Grane DEFINITION 
A . Consistently does more than is required. 

. Has wide vocabulary at his command. 

. Is always alert; takes active part in discussions. 

Has unusual dependability in taking 

Is prompt, neat, and thorou 

teachers’ correction. 

Knows how to select books, tools, materials, and is a rapid worker. 

Has initiative and originality in attacking problems. 

. Has ability to associate and rethink the problem and can adapt him- 
self to new and changing situations. 

. Has enthusiasm for and interest in his work, 

. Has ability to apply ideas gained in study to everyday life. 

. Frequently does more than is required, * 

. Has good vocabulary and Speaks with conviction. 

. Unusually alive to the situation at hand, 

. Careful in complying with assignment. 

. Eager attack on new problems; profits from criticism. 
Prompt, neat, thorough, and unusually accurate in all work, 

Has ability to apply general principles of the course. 

Does what is required. 

Possesses a moderate vocabulary, 

Willing to apply himself during class hour. 


- Does daily preparation with comparative freedom from carelessness. 

- Attentive to assignmen 

. Has ability and willingness to com 
ful response to correction. 

- Reasonably thorou 

H: 


oR со мн 


assignments, 
igh in all work, and unusually free from 


Бо ane 


OAR ©з юк з сз сл къ оо юк 


ply with instructions and a cheer- 


оо мч 


3 School and Society, Vol. 21: 142, January 31, 1925. 


OBJECTIFYING THE MARKING SYSTEM 237 


GRADE DEFINITION 


1. Usually does what is required. 
2. Attendance often irregular. 
3. Tools and equipment sometimes lacking. 
4. Frequently “misunderstands” assignment. 
5. Willing but slow in complying with instructions and corrections. 
6. Careless in preparation of assignments. 
7. Lacking in thoroughness and sometimes tardy with work. 
8. Careless in presentation of work. 
Fd 1. Usually does a little less than is required. 
2. Listless and inattentive in class. 
3. Tools and equipment for work often lacking. 
4. Always tardy with work. 
5. Seldom knows anything outside the lesson. 
6. Retains only fragments of the general principles of the course. 
7. Lacking in qualities of the first three groups to the extent that he 
cannot or will not do the work. 


3. Each teacher should give ob jective examinations or quizzes 
frequently throughout the term, and the scores from these tests should 
afford the major basis for his marks. Prior to the assignment of 
marks for a school period or semester the pupils should be ranked on 
their test scores and these scores should then be transformed into 
marks on a five-point letter scale by the use of the standard deviation 
technique in large sections or classes (thirty or more). In small 
classes (less than thirty) this may be accomplished somewhat more 
simply by dividing the distribution of scores into five groups and as- 
signing the designated mark to previously determined percentages of 
the class. 

The letter grades used and the typical percentages of the class as- 


signed each grade under these conditions are as follows: 


Letter Grades A B с р Fd 


Percentage of class.....- 4-6 19-21 48-52 19-21 4-6 


The essential steps in the assignment of grades by the standard 
deviation method are outlined in Chapter XIV. The actual solution 
of a problem utilizing this method in the assignment of marks to ob- 
jective-test scores from a class of forty-five pupils is shown in 


Table 45, page 208. 

4. Require teachers 
riod carefully worded statemen 
that period. Unless this is done, 


to prepare in advance for each six-weeks pe- 
ts of the objectives of cach subject for 
no one can determine whether or not 


238 INTERPRETING THE RESULTS OF TESTING 


the pupils are being tested on the points on which they should Бе 
tested. This statement of objectives should be the criterion by which 
the validity of the objective tests is determined. 

5. Work prepared for daily assignments should Ье treated as a 
requirement of the course, but marks assigned should be determined by 
numerous brief objective quizzes or tests upon the work assigned. : 

6. Notebook and laboratory work should be treated as a require- 
ment of the course, and credit should be deducted or withheld for work 
which is unsatisfactory or incomplete. However, the marks assigned 
should be determined by frequent objective tests on the work rather 
than on the basis of the notebook or laboratory work which may or 
may not be the pupil’s own work. 

7. Assign marks on “accomplishment” or “performance” rather 
than on indefinite subjective factors such as effort, attitude, ability, 
еіс. 

8. Final grades summarizing all the quiz and test grades for the 
course can be obtained quite readily by assigning point values to cach 
letter grade, computing the actual average for each pupil, and then 
re-assigning the class marks on the basis of these averages. This is a 
very simple way of assigning final grades for fairly large groups and 
in courses in which a relatively large number of objective marks are 
to be summarized in the final grade. It also permits the application 
of a definite schedule of weighting for certain period and final tests 
in accordance with the teacher’s judgment of their importance. 

The accompanying table of point-values (Table 54) corresponding 
to specific letter grades may be useful to the teach 
suggested for plus and minus values of 
of softening some of the shock from th 
assigned on the basis of the normal curve. Students whose test scores 
fall just below the point where a Superior grade is given sometimes 
feel that this is a distinct element of unfairness in the system. As- 


signing the plus and minus letter grades to their quiz scores serves to 
take care of this problem quite adequately. 


er. Values are 
the letter grades as one means 
е arbitrariness of letter grades 


V. SUMMARY 


This chapter deals with the practic 
results which make it possible for the 
profit from these results, 

The acceptance of the notion th 
cal expression of performance whic 
ing at the time, reveals the 


al steps in the analysis of test 
classroom teacher to utilize and 


at a test score is merely a numeri- 
vh h, subject to the conditions operat- 
ability of the individual is essential to a 


SUMMARY 239 


TABLE 54 


SuccEsrED Point VALUES CORRE- 
SPONDING TO LETTER GRADES 


Grade Points 
A+ 16 
A 15 
A— 14 
B+ 12 
B 1 
B— 10 
C+ 8 
C 7 
c— 6 
D+ 4 
D 3 
D— 2 
Fd 0 


safe and sane interpretation of the meaning of test scores. A recog- 
nition of the inferences involved in the interpretation of tests of gen- 
eral or mental ability will do much to protect against the over-inter- 
pretation of the results of such tests. 

'The meanings of the terms standardization, standards, and norms 
are clearly brought out and illustrated in this chapter. The various 
types of derived scores likely to be useful to the teacher of industrial 
education are discussed. 

Since the major use of tests by the classroom teacher is in the 
aluation of achievement, the importance of the informal objective 
test and other teacher-made measures is emphasized. Present ten- 
dencies are distinctly in the direction of the more systematic use of 
such instruments as the most important single basis for the assignment 
of teachers’ marks. This practical aspect of measurement is so im- 
portant that considerable attention is given in this chapter to the 
discussion of possible methods of improving the marking system. After 
all, the marking system is the one phase of educational measurement 
with which practically every teacher comes into close contact. 


ev. 


240 INTERPRETING THE RESULTS OF TESTING 


EXERCISES IN INTERPRETING RESULTS OF TESTING 


1. Elaborate your interpretation of the basic concepts underlying the meaning 

of a test score. 

Show by illustration the real differences between test norms and test standards. 

3. By referring to Table 52, compute the grade levels (G-scores) corresponding to 
scores of 74 and 100. 

4. Find the intelligence quotients (LQ.) for two individuals each 12 years 5 
months whose mental ages are 11 years 9 months, and 13 years 11 months, 
respectively. 

5. Criticize the recommendations given for the objectification of the marking 
system given on pages 235 to 238. 

6. In your opinion do the definitions of the meaning of the various letter grades 
have any place in a program for the objectification of teachers’ marks? 

7. Show how you would average the following letter grades to secure a term 
final grade, assuming all grades to count the same except the final examina- 


tion grade which is allotted triple weight. What final grade would you 
assign? 


ы 


First test — B Fifth test — А 
Second test — C Sixth test — C 
Third test — B Final examination — В 
Fourth test — A 


8. Using the standard deviation technique as illustrated in Table 45, assign letter 


grades to the objective-test scores secured from the second test given in 
Problem 7, Chapter XIV. 


SELECTED REFERENCES 


Banas, C. W., and GREEN: 
University of Iowa 
Towa. 


BnuzckNER, Leo J, and Merny, E. O., Diagnostic and Remedial Teaching 
Boston: Houghton Mifflin Company, 1931. 


Воскіхонам, В. R., Research for Teachers. New York: Silver, Burdett and 
Company, 1926. | 


Franzen, RAYMOND, The Accomplishment Quotient Technique. New York: 
Teachers College Contributio 


Г ns to Education, No. 125. Columbia Univer- 
sity, New York, 1922. 


GREENE, Н. A, Work-Book in Educational 
mans, Green and Company, 1930. 
Greene, Н. A., and JORGENSEN, A. N., Th 
School Tests. New York: Longman: 

Озал, C. W., Educational Measureme 
Century Company, 1930. 

Rvcn, G. M., The Objective or New-Type Examination. Chicago: Scott, Fores- 
man and Company, 1929. ` i 

Ruca, Hanorp O., “Teachers Marks and 
istration and Supervision, Vol. 1:11 


E, H. A, “Teachers’ Marks and the Marking System,” 
Extension Bulletin, No. 244, May 15, 1930, Iowa City, 


Measurements. New York: Long- 


€ Use and Interpretation of Elementary 
8, Green and Company, 1935. 
nt in High School. New York: The 


Marking Systems," Educational Admin- 
7-142, February, 1925, 


SELECTED REFERENCES 241 


Ѕмітн, H. L., and Wricut, W. W., Tests and Measurements. New York: Silver, 
Burdett and Company, 1928. 

Ткавов, M. R., Measuring Results in Education. New York: American Book 
Company, 1925. 

WILSON, С. M., and Hore, K. J., How to Measure (Revised). New York: The 


Macmillan Company, 1928. 
Woopy, Currrorp, and Sancren, PAUL, V., Administration of the Testing Pro- 


gram. Yonkers: World Book Company, 1932. 


APPENDIX 


| This appendix contains two types of material supplementary to the 
discussions and illustrations in the main body of this volume. The 
list of publishers and distributors of tests should be useful in making 
a contact with additional types of test materials likely to interest 
the industrial education teacher. The glossary of terms used in the 
discussion will help to clarify the meaning of some of the more tech- 


nical expressions. 


APPENDIX A 


PRINCIPAL DISTRIBUTORS AND PUBLISHERS OF TESTS OF 
INTEREST TO INDUSTRIAL EDUCATION TEACHERS 
AND SUPERVISORS 


ted list of distributors and pub- 
of interest to industrial education 

Obviously this list does not in- 
ublishers of tests of 


This Appendix presents a selec 
lishers of test material likely to be 
students, teachers, and supervisors. 
clude many of the important distributors and p 


more general interest. 


Milwaukee, Wisconsin. 


Bruce Publishing Company, 
University of Iowa, Iowa City, 


Bureau of Educational Research and Service, 
Iowa. 

Educational Test Bureau, Minneapolis, Minnesota. 

Ginn and Company, Boston, Massachusetts. 

Gregory Company, The C. A., Cincinnati, Ohio. 

Houghton Mifflin Company, Boston, Massachusetts. 

Manual Arts Press, Peoria, Illinois. 

Marietta Apparatus Company, Marietta, Ohio. 

Public School Publishing Company, Bloomington, Illinois. 

Scott, Foresman and Company, Chicago, Tllinois. 

Smith, Turner E., Atlanta, Georgia. 

Stanford University Press, Stanford Uni 

Stoelting Company, C. H., Chicago, Illinois. 

Teachers College Bureau of Publications, Columbia University, New York. 


World Book Company, Yonkers, New York. 
243 


versity, California. 


244 APPENDIX 


APPENDIX B 
GLOSSARY 


This glossary is appended for the convenience of the student or 
teacher who may find that many of the terms used in this treatment 
аге outside of his experience. 


ability. Power to produce; the result of school trainin 
ating on capacity. 

accomplishment. Used synonymously with achievement or production. 

age norms. Typical performance of subjects grouped by a 
expressed as the average of actual performance of sub 
groups, 

age scores. The age equivalents a: 
vided with age norms. 

alternate response. Used in describing any objective test exercise in which the 
subject must choose between two possible answers, one of which is right 
and the other of which is wrong. See true-false. 

ambiguity. A lack of clearness or definiteness in the statement of a fact or a 
test, item. 

analytical test. A test which, 
total accomplishment in a sı 
underlying skills but does 
causes of weakness. 

aptitude. Predisposition for successful 

arithmetic mean. A measure of central tendency commonly called the average. 

array. A collection of data usually arranged around a particular function. 

assumed mean. The mid-point of the class interval taken as the zero point in 


laying off deviations in computing the arithmetic mean from a frequency 
distribution, 


capacity. Power to learn or to profit from training, 

character traits. Qualities of the individual such as mentality, honesty, morality, 
sense of humor, sympathy, ete, which make up personality, 

chronological age. The life age of an individual, 

classification of pupils. The pl 
by grades or ages for more 


g and environment oper- 


ge groups. Usually 
jects of different age 


ssigned to given point scores on tests pro- 


by taking cross-sections of abilities related to 
ubject, furnishes a basis for an analysis of the 
not necessarily reveal their interrelationships or 


achievement in a given field. 


acement of pupils in a Scho 


t ol system in groups 
economical instruction, 


each category with the frequency y 
traits were completely unrelated, 
composite score. A single value used to express the results obtained from a 
number of different measures, 
comprehension. The degree of understanding of 
conditions. Factors causing variations in testing 
correction. A remedy or adjustment. Also in a 
with the computation of the arithmetic mean. 


an exercise or material read. 
or experimental situations. 
technical sense in connection 


APPENDIX 245 


correction for chance. In alternate- or multiple-response tests there is a cer- 
tain opportunity for guessing to enter. The correction for chance is the 
adjustment for guessing. 

corrective. Used synonymously with remedial. Implies the remedying of ob- 
served defects or difficulties. 

correlation. The relation between two or more series of measures of the same 
individuals or items. 

criterion. The standard by which the validity of measurement may be de- 
termined. 

diagnosis. Exact identification and location of strengths and weaknesses. 

diagnostic test. А test sufficiently reliable and detailed in content to identify 
and reveal individual pupil weaknesses. 

difficulty. When used in reference to test items it implies a large percentage of 
incorrect responses. 

discrimination. The quality in a test or test item which enables it to dis- 
tinguish adequately between varying levels of ability. 

educational guidance. А program designed to direct pupils into school activities 
in which they are likely to succeed and find most profit, and away from 
fields in which difficulties and failures are almost. certain to be encountered 
by the child. 

error of grouping. A variable error entering into the tabulation of data in fre- 
quency distributions. Brought about through the practice of placing to- 
gether in class intervals measures which may be widely unlike. 

error of sampling. The result of using a too limited number of cases as being 
typical of a large group. 

essay-type test. Sce traditional examination. 

exercise. A unit of work in a test governed by a specific set of directions. 
Sometimes used in the sense of a stimulus for drill. 

extraversion. The process of being interested in and stimulated by persons and 
things outside of oneself. 

fore exercise. A preliminary or practice exercise for the purpose of giving the 
pupil experience with the specific test situation. 

form. Used to distinguish between two or more closely equivalent arrange- 
ments of similar but not identical test items. 

frequency. The number of measures in а given i 
quently indicated by the symbol f. 

frequency table. A distribution showing the number of measures assigned to 
successive class intervals. 

fulcrum. The axis upon which а 

general ability. Same as general intelligence. 


trasted with achievement. 
grade equivalent. The grade or fraction of a grade nearest which a pupil's test 


score places him when compared with the grade norms for the test. 
ocess of classifying data into certain categories. 
designed for administration to a number of individuals at 


interval or tabulation. Fre- 


lever is supported and rotated. 
A test of general capacity. Con- 


grouping. The pr 
group test. A test 


the same time. 
home mechanics. A term used to describe manual tasks arising from the main- 


tenance and repair of household articles and equipment. 
individual differences. Observed or measured unlikenesses in pupils in capacity, 


ability, etc. 


246 APPENDIX 


informal test. A teacher-made instrument as contrasted with a standardized 
test. 

intelligence. The power to learn, or to profit from training. 

LQ. Intelligence quotient. An index expressing relative brightness as the ratio 
of mental age to chronological age. 

interpolation. The process of locating an intermediate point between two known 
points in accordance with the operation of laws conditioning the case at 
hand. 

interpretation. The explanation of results and the application of same to a con- 
crete situation. 

interval Used interchangeably with step in preparing a frequency table. 

introversion. The process of having one's interests turned in oneself. 

manipulative tests. Performance tests in which the subject turns out an objec- 
tive product as a result of planning and tool operation. 

matching-type test. A type of test item in which the stimulus and response 
forms are presented in parallel columns for convenience in recording the 
identification. 

mean. The arithmetic mean. The point on a scale of values about which the 
deviations are least. 


median. A common measure of central tendency. See definition in the text. 

mental ability. The power to learn. 

mental age. The mental ability of an individual expressed in terms of the age 
of an average individual having that ability. 

mid-point. The exact middle of a step in a frequency table, 

multiple-choice test. A type of objective test ma 
such a way that the subject must select one 
a group of possible responses. 

N. A symbol used to indicate the number of cases 

negative correlation. A relationship in which large values of the one variable 
are always accompanied by small values in the other, 

normal. Typical; making regular progress or development. 

norms. Representations of the typical or average performance of subjects of 
different age or grade groups. Usually based on a large number of cases. 

objective. A term used in describing tests in which no opportunity for dis- 
agreement as to correctness of response exists, 

objectives. Used in the discussion of 
with outcomes. 

Percentile. The points which divide th 
quency distribution into 100 equal 

performance. Achievement, 
ability or capacity. 

Personality inventory. A personal rating dev 
personality characteristics are revealed. 

power tests. Tests which express achievement in term: 
task which the subject is just able to perform, 

practice effect. Increase in a test score due to previous experience with the test. 

Product scale. A measuring device listi 


ng variable characteristi ich judg- 
ments are to be based. ол» 


prognostic test. A test designed t. 
basis of present. performance, 


de up of exercises arranged in 
or more correct, responses from 


in an array. 


curriculum construction as synonymous 


е total number of cases in a given fre- 
parts. 
Also used to distinguish lest scores as such from 


ice from the results of which certain 


s of the difficulty of the 


0 predict probable future achievement on the 


APPENDIX 247 


quality scale. А device which measures by comparison with a set of standard 
specimens the result of applying some specific skill. 

quartiles. The result of dividing a distribution into quarters. 

random sampling. А selection of cases on a purely chance basis. 

range. Scale difference between extremes of an array. 

rank. Position assigned to a score in a series. 

rate test. A device which measures achievement in terms of the number of 
tasks of uniform difficulty which can be performed in a specified time. 

rating scales. Measuring devices which set up levels of qualities or products 
for the guidance of judges in evaluating such qualities or products in the 
classroom or shop. 

recall test. A test or exercise which calls for the subject to state the answer 
rather than to recognize it among several possible responses. 

recognition test. A test or exercise in which the student merely identifies the 
correct form of response from several possibilities. 

relative rank. Position assigned to scores in a limited array. 

reliability. A technical expression of the consistency with which a measuring 
instrument performs. 

reliability coefficient. An index found by the process of correlation indicating 
the relation which may be expected between successive administrations of 


the same measuring instrument. 

remedial. Material and devices which are designed to correct existing weaknesses 
in learning or mastery. 

retarded. Uscd to imply school progress or mental or educational development 
which is slower than is expected of the normal subject. 

scores. A description of the performance of a subject. 

sigma. Synonymous with standard deviation. 

standard deviation. A common measure of variability of scores. 

standardization. The process of refining a test and setting up objective goals of 
performance. 

standards, Ultimate goals of achievement. Mistakenly used synonymously with 


norms which imply actual levels of accomplishment. 
subjectivity. The degree to which measurement results are affected by personal 


factors or judgments. 

survey tests. Tests which have for their ma 
abilities in terms of broad general functions. 

tabulation. The process of classifying data in tables for condensation and inter- 


pretation. 
teacher’s marks. 
specific field of activity assigned by the cla 
Skill in executing tool or machine operations. 
ice by which a numerical expression of the 


in purposes the measurement of 


The personal evalution of the pupil’s accomplishment in a 
ssroom or shop teacher. 


technique. 
test. Any type of measuring dev; 
pupil's performance is secured. 


traditional examination. Examinations or tests of the non-objective or discus- 


sion type. | 
training. The learning opportuni 
contacts. 


true-false test. а 
to determine the truth ог falsity of items. 
A derived test score based on the standard deviation unit. 


ty afforded through school, shop, or other life 


A recognition-type test in which the individual is called upon 


T-scores. 


248 APPENDIX 


unit of measurement. The quantity or quality used as the basis for expressing 
differences. 

validity. A term used to express the degree to which a measuring instrument 
measures the thing it purports to measure. 

vocational guidance. A program designed to direct individuals into vocational 


activities for which they are suited and away from activities for which they 
are not suited. 


zero point of a scale. The point of origin of the instrument. 


INDEX 


Ability, defined, 225 
Absolute ranks, 219-221 
Accomplishment quotient (A. Q.), 235 


Accuracy exercises, 117-118 
Achievement tests, industrial educa- 
tion, 63 


Administering tests, 58-59 
Administrability, 37 
Age-norms, 232 y i 
Alienation, coefficient of, 217 > 
Allport, F. Н. and С. W., 185 
Anderson, Rose, 77, 90 
Aptitude, measurement. of, 83-88 
Arithmetic mean, 193, 198-199 
definition of, 194 
illustration of calculation of, 195 
summary of steps in calculation of, 
194-195 
Art, Judgment, 90 
Ayres Handwriting Scale, 100 


Badger, A. J., 73 

Badger Test in Mechanical Drawing, 
69 

Baker, Harry J., 89 

Ballenger, H. L., 96 

Bangs, C. W., 240 

Bernreuter Personality Inventory, 174- 
175 

Bibliographies, 9, 16-17, 30, 41-42, 53, 
62, 73-74, 89-90, 105, 129-130, 149, 
171, 185-186, 224, 240-241 

Binet-Simon, 80 

Bird, Verne A., 89 

Bixler, H. H., 99, 105 

Board, Edna, 89 

Book, W. F., 185 

Brewer, John M., 30 

Brueckner, L. J., 240 

Buckingham, B. R., 129, 240 


Capacity, defined, 226 
Carpenter, J. E., 89 
Castle, D. W., 73 
Castle Mechanical Drawing Test, 70 
Central tendency, measures of, 193-199 
Chance factor in objective tests, 123- 
125 
Chapman, J. C., 72 
Character traits, 172-174 
measuring, 174-177 
rating scales for, 178 
Christy, E. W., 73 
Class grades, use of sigma for analysis 
of, 207-209 ` 
Class-intervals, 189 
Cleeton, Glen U., 73 
Coefficient of correlation, meaning of, 
216-217 
uses of, 218 
Columbia Research Bureau Algebra 
- Test, 102 
Columbia Research 
Test, 103 
Compass Arithmetic Tests, 
diagnostic, 102 
survey, 102 
Composition, measurement of, 95 
Cornell, E. L., 186 
Correction for guessing, 124 
Correlation, coefficient, meaning of, 
212-213 
computation of Pearson Product-Mo- 
ment, 213-216 
meaning of, 216-217 
table, 215 
uses of, 218 
Course in woodworking, objectives of, 
133 v 
Coxe, W. W., 186 
Cram, Fred D., 96 


Bureau Physics 


249 


250 


Crawford, J. R., 38 
Criteria for tests, 31-40 
Crockett, A. C., 89 


Deciles, 220 
Derived scores, 231-235 
Deviation, see Variability 
Diagnosis, class, 22-24 
individual, 25-27 
Diagnostic tests, 14 
Difficulty of exercises, 137 
Discussion exercises, 9, 16, 30, 41, 52, 
62, 73, 89, 104, 129, 148, 170, 185 
Dolch, E. W., 186 
Donson, George C., 73 
Double-entry or correlation table, 215 
Downey, June, 186 
Dragoo, Alva W., 186 
Drawing samples, rating of, 4-6 


Economy in testing, 39 
Educational age, 232 
Educational quotient (E. Q.), 234-235 
Educational tests, classification of, 13- 
15 
Elliott, Edward C., 9 
English tests, 94-97 
Ericson, Manuel E., 171 
Error of grouping, 188-189 
Essay-type tests, 10 
constructing, 127 
scoring, 127 
types of exercises, 125-127 
Evaluation of tests, 40 
Exercises, difficulty of, 137 


Farwell, W. H., 103 
Fischer, Ferdinand A, B. 73 


Fischer Mechanical Drawing "Tests, 70- 
71 


Flaherty, E. B., 73 
Flam, August, 74 
Foster, R. R., 129 
Franzen, R., 240 
Freeman, F, N., 89, 101 
Freeman Di 
101 
Frequency table, steps in making, 188- 
193 
Fryklund, Verne С., 89 


agnostic Handwriting Scale, 


INDEX 


Garrett, H. E., 149, 224 
Good, Warren R., 224 
Gordon, Geo. Jr. 89 
Gradation of pupils, 27-29 
Grade norms, 231-232 
Grading system, 207-209, 235-239 
Grammar, measurement of, 95-96 
Greene, Charles E., 129 
Greene, Н. A., 16, 30, 41, 62, 93, 96, 
102, 105, 111, 129, 224, 240 

Gregory, C. A., 62, 224 
Group tests of mental ability, 77-79 
Grouping of test scores, 187 

error of, 188-189 
Guessing in objective test, 123-124 
Guidance, 27-29 


Hager, Carl J., 171 

Haggerty, M. E., 105 

Haggerty Reading Examination, 92 

Handwriting scales, 99-101 

Hartson, L. D., 185 

Hawkes, Herbert E., 102 

Henig, M. S, 89 

Herring, J. Р., 89 

Hierstedt, W, G., 74 

Hoke, K. J, 17, 30, 42, 62, 105, 130, 
241 

Holzinger, Karl J., 224 

Horn, Ernest, 99, 105 

Horning, $. D, 74 

Hudelson, Earl, 98, 105 

Hughes, W, Hardin, 186 

Hull, C. L., 89 

Hunter, W. Т, 9, 14, 56, 74, 103, 105 

Hunter Shop Tests, 68-69 


Identification exercises, 118-120 
Industrial arts tests, standardized, 64- 
68 
Industrial education, defined, 1 
desirable traits in, 179 
measurable factors, 43-53 
measurement in, 1 
Informal objective tests, difficulty of 
items in, 137 
rearranging items in, 137 
samples of, 139-144 
securing objectivity, 136 
securing reliability, 138 


INDEX 


Informal objective tests, securing 
validity, 131-136 
steps in building, 131 
Intelligence, defined, 15, 75 
measurement of, 75 
methods of measuring, 76 
tests of, 77-80 
Intelligence quotient, 81-82, 233-234 
Intelligence tests, 15, 77-80 
Iowa Elementary Language Test, 96 
Iowa Grammar Information Test, 96 
Iowa Silent Reading Test, 93 
Items, difficulty of, 137 
rearranging, 137 


Johnson, H. J., 74 

Jones, Arthur J., 186 

Jorgensen, A. N., 16, 30, 41, 62, 93, 105, 
129, 224, 240 


Keane, F. L., 89 

Kelley, T. L., 102, 149, 171, 224 

Kelley, V. H., 93 

Kelly, F. J., 9, 128 

Kilzer, L. R., 103 

Kilzer-Kirby Inventory Test, 103 

Kirby, T. J., 103, 105 

Kirby Grammar Tests, 96 

Knight, F. В., 102 

Kramer, Edna E. 224 

Kroll, H. W., 74 

Kuhlmann, F., 77, 78, 90 

Kuhlmann - Anderson Intelligence 
Tests, 77 


Laird, Donald A., 186 

Lane, Ruth E., 105 

Lane-Greene Unit-Achievement Tests, 
103 

Lang, A. R., 16, 129 

Leavitt, F. M, 9 

Lindquist, E. F., 224 

Loofbourow Keys Personal Index, 175- 
177 


McCall, W. А„ 62 
McClusky, F. D., 186 
McKay, H. D., 186 
MacQuarrie, T. W., 90 
Madsen, I. N., 90 


251 


Manger, Emerson W., 171 
Manipulative tests, administering, 58- 
59 
scoring, 59, 61 
Mansperger, D.E,9 
Marking system, objectifying the, 235- 
239 
Marsh, Willa, 89 
Mastery, factors related to, 43-53 
Matching exercises, 113-114 
Mathematics, measurement of, 101-103 
Mean, arithmetic, calculation of, 194- 
195 
Measurable factors, 43-52 
Measurement, significance of, 1 
Mechanical aptitude, measuring, 83-88 
Mechanical drawing tests, scales, 153- 
155 
tests, 69-72 
Mechanical features of tests, 39 
Median, calculation of, 195 
definition of, 196 
uses of, 198-199 
Meier, Norman C., 90 
Mental ability, group tests of, 77-79 
individual test of, 79-80 
meaning of, 75 
measuring, 76 
Mental age, 81 
Mental test score, meaning of, 82 
Mid-measure, calculation of, 196 
definition of, 196 
Minnesota Mechanical-Ability 
84-87 
Mitchell, B. C., 224 
Monroe, W. S, 16, 30 
Morris, Elizabeth H., 186 
Motivation of learning, 21 
Multiple-response exercises, 109-110 
Murbach, Nelson J., 74 


Tests, 


Nash, Harry B., 9, 20, 56, 74 

Nash-Van Duzee Mechanical Drawing, 
71-72 

Nash-Van Duzee Woodwork Test, 64- 
66 

New Stanford Achievement Tests, 102. 

Newkirk, L. V., 9, 14, 30, 74, 115, 186 

Newkirk-Stoddard Home Mechanics 
Test, 32, 66-67 


252 


INDEX 


Non-standardized industrial arts tests, | Quality scales, reliability of, 168 


68-69 
Norms, 21 
and standards, 227-231 
kinds of, 230 * » 
meaning of, 228 
specific uses of, 229-230 


Objective exercises, classification of, 
107-115 
samples of, 108-115 
Objective tests, 10 
samples of, 139-144 
standardized and informal, 13 
Objectivity, 36-37, 136 
O’Conner, Johnson, 89 
Odell, C. W., 16, 30, 41, 62, 76, 129, 
149, 186, 224, 240 
Orleans, J. S., 129, 186 
O'Rourke, L. J., 90 
Otis, A. S., 40, 78, 90, 102, 224 
Otis Group Intelligence tests, 78 
Otis Reasoning Tests in Arithmetic, 
102 
Otis Self-Administering Test, 78 
Otis Test-rating scale, 40 


Paterson, D. G., 111, 149, 171 
Peckstein, L. A., 89 
Percentile scores, 232 
Performance, defined, 225 
exercises, 116-117 
Performance tests, 144-148 
scoring, 147 у? А 
Steps in preparing, 145-147. 
Personality and character traits, con- 
structing scales for, 180-182 
measuring, 174-177 
rating scales for, 178 
sample scales for, 183, 184 
Piper, A. H., 102, 105. 
Pressey, L. C., 105 
Pressey Technical Vocabularies, 97 
Project rating scale, 151, 152 
sample of, 153, 155 
using, 156 


Q as measure of variability, 201 
Quality exercises, 117-118 
Quality scales, 157 


steps in making, 158-108 
Quartiles, 201-202 
Quotients, 233-235 


Range, as measure of variability, 201 
of scores, 188 
Ranks, absolute, 219-221 
assignment of, 218 
_ relative, 218 
Rating, character, 178 
drawing samples, 4-6 
scales for, 150 
sheet-metal projects, 7 
shop projects, 4-8 
woodwork samples, 3-5 
Reading tests, 91-94 
Rearrangement exercises, 114-115 
Recall exercises, 107-109 
Recognition type exercises, 109-114 
Reedy, Caroline M., 90 
Relationship, measures of, 211-218 
Relative ranks, 218 
Reliability, 34-36 
Research uses of tests, 29 
Rice, G. A., 149 " 
Ruch, G. M., 17, 30, 41, 62, 102, 107, 
124, 128, 129, 130, 149, 240 
Ruch-Popenoe General Science Tests, 
103 
| Rugg, Н. О, 224, 240 


Sampling, effect on reliability, 35 
illustration of, 35 - 
Sanford, Vera, 102 
Sangren, PV, 62, 241 
Seale differences, determining, 160-166 
Scales, test-rating, 40 
training in use of, 60 
Sealing test items, 210 
School marks, 2, 207-209, 235-239 
Schorling, Raleigh, 102- " 
cience, measurement of, 103. 
Scoring tests, 59, 61 w 
Sealy, G. A., 129 
Seashore, C. E., 90 


Selected references, 9, 16-17, 30, 41-42, 
53, 62, 73-74, 89-90, 105, 129-130, 

149, 171, 185-186, 224, 240-241 

et-metal projects, rating of, 7 


E 


She 


INDEX 


Shop projects, rating of, 4-8 
Sigma, see Standard Deviation 
Simmons, E. P., 99, 105 
Simmons-Bixler Spelling Scale, 12 
Smith, H. L., 17, 30, 42, 62, 90, 105, 130, 
241 
Smith, Homer J., 9 
Speed exercises, 121-123 
Spelling, importance of, 98 
Seven-S scales, 98 
Simmons-Bixler Scales, 12, 99 ` 
Standard deviation, as measure of vari- 
ability, 202-203 
computation of, 204 
definition of, 202 
meaning of, 203-204 
uses of, 207-211 p 
Standardized and informal tests, 13 
Standards vs. norms, 38 
Stanford Revision of Binet-Simon, 80 
Starch, Daniel, 9 
Statistieal problems, 222-224 
Stenquist, John L., 75, 90 
Stenquist Assembling Tests, 87 
Stenquist Mechanical Aptitude Tests, 
88 Ы 4 
Stockwell, Lynn E., 89 
Stoddard, G. D., 14, 17, 30, 42, 62, 74, 
115, 224 
Stoy, E. G., 90 
Studebaker, J. W., 102 
Sutherland, S. S., 90 
Swope, Ammon, 9 
Symonds, P. M., 17, 30, 130 


ES 


Tabulation methods, 187-193 + 
Teacher’s marks, 2, 235-239 
Technique exercises, 120-121 
Terman, L. M., 90, 102 
Terman Group Test of Mental Ability, 
79 
Test-rating | Ка, 40 
"Test scores, meaning of, 225-227 
Testing techniques, 106-115 
Tests, characteristics of, 10 
equivalence of forms, 40 
kinds of, 13-16 
meaning of, 12 
mechanical features of, 39 


253 


Tests, related to instruction, 18 
responsibility for giving, 54 
scales, 12 
uses of, 19-29 
when to give, 55 

Thorndike, E. L., 101 

Thurstone, L. E., 171 

Toops, Н. A., 130 

Trabue, M. R., 62, 224, 241 

Trade tests, 72 

True-false tests, 110-113 

T-scores, 209-210 


Unit achievement tests, in algebra, 102 
in plane geometry, 103 


Valentine, B. F, 186 
Validation of tests, 32-34, 131-136 
Validity, 31-34, 131-136 
Van Duzee, Roy R. 9, 20, 56, 74 
Variability, measures of, 200 
Q as measure of, 201-202 
range as measure of, 201 
sigma as measure of, 202-203 
Variables, controlling, 57 
Vocabulary tests, 97 


Weaver, C. G., 74 
Weidemann, C. C., 130 
Wells, G. K., 74 
Wells-Laubach Industrial Arts Tests, 
68 
Willing, M. H., 95, 105 
Willing Composition Scale, 95 
Wilson, G. M., 17, 30, 42, 62, 105, 130, 
241 И М 
Wilson Language Error Test, 97 
Wood, Ben D., 17, 102, 103, 130 
Woodworking, informational content 
of, 45 
objectives of course in, 133-135 
rating of samples in, 4-8 
sample of test, 140-144 
Woody, Clifford, 62, 241 


Work-Book in Educational Measure- 


ments, 193, 196, 198, 207, 218, 224 
Wright, W. W., 17, 30, 41, 62, 90, 105, 
130 
Writing, 99-101 


Zero point, 166 


p 


САБА. DONA Дам ыбы, 
é 
AL 
" 
Ls Form Мо. 3. nf n 
j = LO PSY, RES.L:1 ^ 


- Bureau of Educational & Psychological e 


Ыы su" Research Library. — — à 


«9 *"* "The 52. is” s fo be n returned within 
a 2 date stamped. last. 


pem р 


. ^ 
А хаос dem eHRmRRHMeRmaRY 
EY L 


җе... Фф. эө. 
t^s, 


T dg 


tere MELLE PPAR 


ve EM 
i "+++, HILL ве... - “ 
Ery 
= „ъъ 
4 
ror. А 
, = was * 
T =» , 
БС 
ww ёч 
fu 
WBGP-59/60-5119C.5t 
+ 


- wire z D $ s: BTE AN 

Na TE ahis er i t ? т bonos LAN ma 
x : y 

" a X » * 


—À— "P 


